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The Applicants respectfully request reconsideration of the present application in view of 
the reasons that follow. 

I. Status of the claims 

No claims are amended, added or canceled in this paper. Accordingly, claims 1-4, 6-7, 

II, 14-20, 23, 26-32, 34, 36, 44-55, and 142-144 remain pending in this application, with claims 
3, 4, 6, 7 and 142-144 under examination. 

n - Claim rejection - 35 U.S.C. S 101 

Claims 3, 4, 6, 7 and 1 42-144 are rejected under 35 U.S.C. § 1 01 as allegedly failing to 
meet the utility requirement. Specifically, the Office Action asserts that a skilled artisan would 
not reasonably conclude that the claimed polynucleotides can be used as a marker for brain tissue 
because "SEQ ID NO: 56 has a mere 2-fold higher expression in brain than in the reference 
sample." Office Action at page 3. The Applicants respectfully traverse this ground for rejection. 

As noted in the previous reply, credibility is determined by one of ordinary skill in the art. 
A two-fold difference in expression level is more than sufficient to provide the skilled artisan 
with a reasonable expectation of successfully distinguishing brain tissue from other tissue types 
using SEQ ID NO: 56 in a microarray analysis. 

Regarding the use of SEQ ID NO: 56 as a marker for brain tissue, the specification 
describes the microarray analysis which illustrates the tissue-specific expression of SEQ ID NO: 
56 as follows. 
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RNA samples isolated from a variety of normal human tissues 
were compared to a common reference sample. Tissues 
contributing to the reference sample were selected for the ability to 
provide a complete distribution of RNA in the human body and 
include brain (4%), heart (7%), kidney (3%), lung (8%), placenta 
(46%), small intestine (9%), spleen (3%), stomach (6%), testis 
(9%), and uterus (5%). The normal tissues assayed were obtained 
from at least three different donors. RNA from each donor was 
separately isolated and individually hybridized to the microarray 
Since these hybridization experiments were conducted using 
common reference sample, differential expression values are 
directly comparable from one tissue to another. 

Specification at page 102, lines 3-12. Thus, a variety of tissue from ^ at least three differs 
donors" was tested and the expression of SEQ ID NO: 56 was found to be increased by " at least 
twofold' in brain as compared to the reference sample, which was selected to provide "a 
complete distribution of RNA in the human body" Specification at page 102, lines 9, 17 and 5-7, 
respectively. 

The Examiner alleges that a "mere two-fold increase" in expression level is an 
insufficient difference to allow the skilled artisan to distinguish brain tissue from other tissues of 
the body (e.g., heart, kidney, lung, placenta, small intestine, spleen, stomach, testis and uterus). 
The Applicants respectfully disagree with this blanket characterization of significance and argue 
that the skilled artisan would consider - and indeed does consider - an " at least twn-fnM * 
difference in expression level significant. 

For example, numerous microarray studies have deemed fold-difference values of 
between 1.4 and 2 fold as significant. See e.g., (1) Yue et al., Nucleic Acid Research, 29(8) e41 
(2001), reporting a L4Jold change in expression as significant, see abstract (EXHIBIT A); 
(2) Lee et a.., Science, 255:1390-93, page 1392 (1999), reporting LSJoId induction and Mjold 
reduction in gene expression as significant (EXHIBIT B); and (3) Vasseur et al., Molecular 
Cancer, 2(19) (2003), stating at page 2 that "differential expression values of greater than 1/7 are 
likely to be significant, based on internal quality control data," however, that a "more stringent 
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ratio" of "at least 2.0 fold" was used (EXHIBIT C). Indeed, reviews on the topic conclude that 
"there is no magical absolute cut-off for a meaningful fold value" and that essentially, the 
parameters of each analysis must be considered in determining a meaningful cut-off value for 
that particular analysis. See e.g., Tsien et al., "On reporting fold differences," Pacific Symposium 
on Biocomputing, 6:496-507, at 504 (2001), (EXHIBIT D). Accordingly, the skilled artisan 
would consider an expression differential of "at least two-fold" a more than reasonable cut-off to 
distinguish brain tissue from other tissue types in a microarray analysis using SEQ ID NO: 56. 
As such, the noted utility is credible . 

The Applicants respectfully contend that the Examiner impermissibly raises the utility 
standard to something which it is not. With respect to utility, the M.P.E.P. states as follows: 

The claimed invention must only be capable of performing some 
beneficial function ...An invention does not lack utility merely 
because the particular embodiment disclosed in the patent lacks 
perfection or perfo rms crudely ... A commercially success ful 
product is not required.... Kor is it essential that the invention 
accomplish all its intended functions... or operate under all 
conditions ... partial success being sufficient to demonstrate 
patentable utility... In short, the defense of non-utility cannot be 
sustained without proof of total incapacity . If an invention is only 
partially successful i n achievine a useful result, a rejection of the 
claimed invention as a whole based on a lack of utility is not 
appropriate. 

M.P.E.P. § 2107.01.11 (citations omitted, emphasis added). Thus, while a higher "fold 
difference" may be required in some circumstances - e.g., FDA approval - such conditions are 
not required to meet the utility requirement under 35 U.S.C. § 101 . In the present case, the 
expression level of SEQ ID NO: 56 is "at least two-fold higher" in brain tissue as compared to 
the control sample, and the art supports the "at least two-fold" difference in expression to be 
significant, i.e., credible. 
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Accordingly, for at least the reasons stated above, the claimed invention has a specific, 
substantial and credible utility, and reconsideration and withdrawal of the rejection under 
35 U.S.C. § 101 is respectfully requested. 

IIL Claim reject ion -35 U.S.C. S 112. first paragraph 

Claims 3, 4, 6, 7 and 142-144 are rejected under 35 U.S.C. § 1 12, first paragraph, as 
allegedly failing to meet the enablement requirement. The Office Action asserts that "since the 
claimed invention is not supported by either a specific and substantial asserted utility or a well- 
established utility... one skilled in the art would not know how to use the claimed invention." 
Office Action at page 3. The Applicants respectfully traverse this ground for rejection. 

As noted above in section II, the claimed invention has a specific, substantial and credible 
utility. Accordingly, the reason for rejection is moot, and reconsideration and withdrawal is 
respectfully requested. 

IV. Conclusion 

The present application is now in condition for allowance. Favorable reconsideration of 
the application is respectfully requested. 

The Examiner is invited to contact the undersigned by telephone if it is felt that a 
telephone interview would advance the prosecution of the present application. 

The Commissioner is hereby authorized to charge any additional fees which may be 
required regarding this application under 37 C.F.R. §§ 1.16-1.17, or credit any overpayment, to 
Deposit Account No. 19-0741 . Should no proper payment be enclosed herewith, as by the credit 
card payment instructions in EFS-Web being incorrect or absent, resulting in a rejected or 
incorrect credit card transaction, the Commissioner is authorized to charge the unpaid amount to 
Deposit Account No. 1 9-074 1 . 
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If any extensions of time are needed for timely acceptance of papers submitted herewith, 
the Applicants hereby petition for such extension under 37 C.F.R. § 1 . 1 36 and authorize payment 
of any such extensions fees to Deposit Account No. 19-0741 . 

Respectfully submitted, 



Date: January 13, 2010 

FOLEY & LARDNER LLP 
Customer Number: 22428 
Telephone: (202) 672-5538 
Facsimile: (202) 672-5399 



By: /Stephanie H. Vavra/ Reg. No. 45,178 

forMichele M. Simkin 
Attorney for the Applicants 
Registration No. 34,717 



MILVV9689240 



-6- 



© 2001 Oxford University Press 



Nucleic Acids Research, 2001, Vol. 29, No. 8 e41 



An evaluation of the performance of cDNA microarrays 
for detecting changes in global mRNA expression 

Huibin Yue, P. Scott Eastman*, Bruce B. Wang, James Minor, Michael H. Doctolero, 
Rachel L. Nuttall, Robert Stack, John W. Becker, Julie R. Montgomery, Marina Vainer and 
Rick Johnston 

Advanced Research Group, Incyte Genomics, 6519 Dumbarton Circle, Fremont, CA 94555, USA 
Received as resubmission February 13, 2001; Accepted February 18, 2001 



ABSTRACT 

The cDNA microarray is one technological approach 
that has the potential to accurately measure changes 
in global mRNA expression levels. We report an 
assessment of an optimized cDNA microarray platform 
to generate accurate, precise and reliable data 
consistent with the objective of using microarrays as 
an acquisition platform to populate gene expression 
databases. The study design consisted of two inde- 
pendent evaluations with 70 arrays from two different 
manufactured lots and used three human tissue 
sources as samples: placenta, brain and heart. 
Overall signal response was linear over three orders 
of magnitude and the sensitivity for any element was 
estimated to be 2 pg mRNA. The calculated coefficient 
of variation for differential expression for all non- 
differentiated elements was 12-14% across the entire 
signal range and did not vary with array batch or 
tissue source. The minimum detectable fold change 
for differential expression was 1.4. Accuracy, in 
terms of bias (observed minus expected differential 
expression ratio), was less than 1 part in 10 000 for all 
non-differentiated elements. The results presented in 
this report demonstrate the reproducible performance 
of the cDNA microarray technology platform and the 
methods provide a useful framework for evaluating 
other technologies that monitor changes in global 
mRNA expression. 

INTRODUCTION 

The construction of gene expression databases is a high 
priority of today's research community. Such databases, 
closely integrated with other types of genomic information, 
promise not only to facilitate our understanding of many 
fundamental biological processes, but also to accelerate drug 
discovery and lead to customized diagnosis and treatment of 
disease (1-6). 

These databases will require the development of one or more 
underlying supporting technologies that can accurately and 
reproducibly measure changes in global mRNA expression 



levels. The ideal technology should be able to process large 
numbers of samples, require minimal amounts of biological 
source material and be applicable across a wide range of cell or 
tissue types. Several different technologies are currently being 
investigated for their ability to meet these stringent require- 
ments (7-12). While many of these technologies show significant 
promise in preliminary studies, it is critically important that 
each technology be comprehensively evaluated as a complete 
system for producing accurate, precise and reliable expression 
data (13,14). 

The Incyte cDNA microarray technology platform simulta- 
neously analyzes the relative expression levels of up to 10 000 
genes, each of which is present as a unique cDNA element (7). 
The platform is potentially scalable to include all of the 
elements in the human genome. PCR-derived elements 
averaging 1000 nt in length are physically arrayed in a two- 
dimensional grid on a chemically modified glass slide. Aliquots 
from two purified mRNA samples are separately reverse tran- 
scribed using primer sets labeled with two different fluoro- 
phores and the resulting dye-labeled cDNA populations are 
used to probe the target elements in a competitive hybridization 
reaction. After hybridization the glass slide is analyzed in a 
two-channel fluorescence scanner and the ratio between the 
two fluorophores detected for any given element defines the 
relative amount of the mRNA corresponding to that element 
present in the original two samples. 

There are many process variables that will impact on the 
quality of the data generated by any microarray technology 
platform. In this report we describe parameters for the 
manufacture of effective cDNA microarrays with highly repro- 
ducible performance characteristics, the quality and quantity of 
sample mRNAs used to create the dye-labeled cDNA probe 
and the effects of these optimized procedures on the overall 
performance, accuracy, precision and reliability of expression 
data generated from the two-channel ratiometric approach. 

MATERIALS AND METHODS 

Synthesis of PCR products 

PCR was used to generate large quantities of defined target 
DNA for microarray production. Plasmids containing cloned 
genes were grown in Escherichia coli and were amplified 
using vector primers SK536 (5'-GCGAAAGGGGGATGT- 
GCTG-3') and SK865 (5'-GCTCGTATGTTGTGTGGAA-3') 
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(Operon Technologies, Alameda, CA). Briefly, 1 u.1 of bacterial 
cell culture was added to 75 |il of reaction buffer, containing 
10 mM Tris-HCl pH 8.3, 1.5 mM MgCl 2 , 50 mM KC1, 0.2 mM 
each dNTP, 0.5 p.M each primer and 2 U Taq polymerase. The 
mixture was incubated for 3 min at 95°C and 30 cycles of PCR 
were performed at 94°C for 30 s, 56°C for 30 s and 72°C for 
90 s. A final incubation for 5 min at 72°C was followed by 
reduction of the temperature to 4°C in order to terminate the 
reaction. PCR products were then purified by centrifugal 
chromatography with Sephadex S400 resin (Amersham- 
Pharmacia Biotech, Uppsala, Sweden) in a 96-well format. 
Briefly, 400 p.1 of S400 resin pre-equilibrated in 0.2x standard 
saline citrate buffer (SSC) was added to each well of a 96-well 
microtiter plate. A unique PCR product prepared as described 
above was loaded into each well and the plate was centrifuged 
in an Eppendorf 5810 centrifuge at 885 r.c.f. (relative centrifugal 
force). Purified PCR products were concentrated to dryness 
and resuspended in 10 u.1 of H 2 0. DNA was resolubilized by 
thermal cycling (five cycles of 85°C for 30 s and 20°C for 30 s). 

Qualification and quantification of PCR products 

PCR products were routinely analyzed for quality by agarose 
gel electrophoresis and samples that failed to amplify or had 
multiple bands were annotated in the GEMTools database 
management software (Incyte Genomics, Fremont, CA). PCR 
products were quantified using PicoGreen dye (Molecular 
Probes, Eugene, OR) in a fluorescent assay specific for 
measuring double-stranded DNA concentration according to 
the manufacturer's instructions. 

Arraying and post-processing 

Ten thousand PCR products were arrayed by high speed 
robotics (7) on amino-modified glass slides (M.Reynolds, 
unpublished results). Each element occupied a spot of -150 urn in 
diameter and spot centers were 170 um apart. DNA adhesion 
to the glass was achieved by irradiation in a Stratalinker Model 
2400 UV illuminator (Stratagene, San Diego, CA) with light at 
254 nm and an energy output of 120 000 pJ/cm 2 . To minimize 
any potential non-specific probe interactions with the glass the 
microarrays were washed for 2 min in 0.2% SDS (Life Tech- 
nologies, Rockville, MD), followed by three rinses in H 2 0 for 
1 min each. The microarrays were treated with 0.2% (w/v) I-block 
(Tropix, Bedford, MA) in phosphate-buffered saline (PBS) for 
30 min at 60°C. They were washed again for 2 min in 0.2% 
SDS, rinsed three times in H 2 0 for 1 min each and finally dried 
by a brief centrifugation. Dried microarrays were routinely 
stored in opaque plastic slide boxes at room temperature. 

Array qualification: SYTO 61 dye 

As SYTO 61 nucleic acid staining has generally been applied 
to cells, the standard procedure was modified to allow its use 
for measurement of DNA bound to microarrays. A 5 ]xM stock 
solution of SYTO 61 dye (Molecular Probes) in DMSO was 
diluted 1:100 in 10 mM Tris-HCl pH 7, 0.1 mM EDTA (TE). 
Several microarrays from each manufactured batch were 
immersed in this solution for 5 min at room temperature, rinsed 
with TE, rinsed with H 2 0 and finally with absolute ethanol. 
After drying the microarrays were scanned on a GenePix 
4000A scanner (Axon Instruments, Foster City, CA) at 535 nm. 



mRNA preparation and probe synthesis 

Briefly, mRNA was isolated by a single round of poly(A) 
selection using Oligotex resin (Qiagen, Valencia, CA) from 
commercially available human placenta, brain and heart total 
RNA (Biochain, San Leandro, CA). The purified mRNA was 
quantified using RiboGreen dye (Molecular Probes) in a 
fluorescent assay. RiboGreen dye was diluted 1:200 (v/v final) 
and mixed with known RNA concentrations (determined by 
absorbance at 260 nm) ranging from 1 to 5000 ng/ml. A 
Millennium RNA size ladder (Ambion, Austin, TX) was used 
to generate standard curves and unknown samples were diluted 
as necessary. Fluorescence was measured in 96-well plates 
with a FLUOstar fluorometer (BMG Lab Technologies, 
Germany) fitted with 485 nm (excitation) and 520 nm (emission) 
filters. 

Between 25 and 100 ng mRNA were separated on an Agilent 
2100 Bioanalyzer, a high resolution electrophoresis system 
(Agilent Technologies, Palo Alto, CA), to examine the mRNA 
size distribution. 200 ng of purified mRNA were converted to 
either a Cy3- or Cy5-labeled cDNA probe using a custom 
labeling kit (Incyte Genomics). Each reaction contained 
50 mM Tris-HCl pH 8.3, 75 mM KC1, 15 mM MgCl 2 , 4 mM 
DTT, 2 mM dNTPs (0.5 mM each), 2 ug Cy3 or Cy5~random 
9mer (Trilink, San Diego, CA), 20 U RNase inhibitor 
(Ambion), 200 U MMLV RNase H-free reverse transcriptase 
(Promega, Madison, WI) and mRNA. Correspondingly labeled 
Cy3 and Cy5 cDNA products were combined and purified on a 
size exclusion column, concentrated by ethanol precipitation 
and resuspended in hybridization buffer. 

Array qualification: complex and vector-specific 
hybridizations 

Hybridization of labeled cDNA probes was performed in 20 u,l of 
5x SSC, 0.1% SDS, 1 mM DTT at 60°C for 6 h. Hybridization 
with a Cy3-labeled vector-specific oligonucleotide (5'-TTCG- 
AGCTTGGCGTAATCATGGTCATAGCTGTTTCCTGTGT- 
GAAATTGTTATCCGCTCA-3') (Operon Technologies) was 
performed at 10 ng/pl in 5x SSC, 0.1% SDS, 1 mM DTT at 
60°C for 1 h. The microarrays were washed after hybridization 
in lx SSC, 0. 1 % SDS, 1 mM DTT at 45°C for 10 min and then 
in O.lx SSC, 0.2% SDS, 1 mM DTT at room temperature for 
3 min. After drying by centrifugation, microarrays were scanned 
with an Axon GenePix 4000A fluorescence reader and GenePix 
image acquisition software (Axon) at 535 nm forCy3 and 625 nm 
for Cy5. An image analysis algorithm in GEMTools software 
(Incyte Genomics) was used to quantify signal and background 
intensity for each target element. The ratio of the two corrected 
signal intensities was calculated and used as the differential 
expression ratio (DE) for this specific gene in the two mRNA 
samples. 

The Axon scanner was calibrated using a primary standard 
and a secondary standard to account for the differences in 
scanner performance [laser and photomultiplier tube (PMT)] 
between the Cy3 and Cy5 channels. For the primary standard 
hundreds of probe samples were prepared that were fluores- 
cently balanced in the Cy3 and Cy5 channels as determined by 
a Fluorolog3 fluorescence spectrophotometer (Instruments 
SA, Edison, NJ). These probes were hybridized to microarrays 
and the scanner PMTs were adjusted to give balanced fluores- 
cence and the greatest dynamic range. Using these PMT values 
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a fluorescent plastic slide was scanned to obtain corresponding 
fluorescent values. This secondary standard was used to 
calibrate scanners on a daily basis. 

Data acquisition and analysis 

Two low frequency data correction algorithms were applied to 
compensate for systematic variations in data quality. The first 
procedure, a gradient correction algorithm, modeled the signal 
response surfaces of each channel. On a 10 000 element micro- 
array the signal responses of Cy3 and Cy5 should be random 
due to the random physical location of the target elements. The 
signal response surfaces were first examined for non-random 
patterns. If non-random patterns were detected, a second order 
response model was applied to model the gene signal 
responses according to their positions on the surface. The non- 
randomness was then corrected using the fitted model. The 
second procedure, a signal correction algorithm, corrected for 
differential rates of incorporation of the Cy3 and Cy5 dyes. In 
an idealized homotypic hybridization, a scatter plot of log Cy3 
signal versus log Cy5 signal should show a signal distribution 
along a line with a slope of 1. If the center line of the signals 
does not have a slope of 1 there may be different rates of 
incorporation of the Cy3 and Cy5 dyes. The signal correction 
algorithm tested whether the slope of the regression line for log 
Cy3 signal versus log Cy5 signal was 1 and applied a regression 
model to rotate the regression line to a slope of 1 if necessary. 



ill 
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Figure 1. Impact of input DNA concentration on differential gene expression. 
A dilution series of PC'R product lor three yeast control fragments was arrayed 
in triplicate in each of four quadrants. The amount of PCR product in the well prior 
to arraying is indicated above each panel. Input RNA ratios lor labeling with Cy3 
versus Cy5 for the three fragments were 30: 1 , 1:1 and 1:30. The log,,, of obsen ed 
differential expression is plotted as a inaction oi log ,„ of input RNA tatios. 



RESULTS AND DISCUSSION 

Impact of arrayed DNA concentration on DEs 

Because of the competitive nature of two channel fluorescent 
hybridizations it has been assumed that the amount of target 
DNA deposited on the glass slide would have little or no 
impact on any observed DEs (15). We tested this assumption 
directly by hybridizing a series of samples at predetermined 
input ratios to microarrays containing varying amounts of 
target DNA. For these experiments the target DNAs were yeast 
fragments, a set of PCR products derived from the non-coding 
regions of Saccharomyces cerevisiae. The amount of PCR 
product was quantified using a fluorescent dye (PicoGreen) 
specific for double-stranded DNA. The targets were spotted in 
three sets containing quadruplicate points from a 2-fold 
dilution series of DNA concentrations ranging from 2.0 to 
0.062 ug/well (10 ul/well). 

Probes for hybridization to the yeast fragments were made 
from T7 RNA transcripts of PCR products. Templates for in 
vitro transcription were made by incorporating a T7 promoter 
in the upstream PCR primer and poly(A) sequences in the down- 
stream PCR primer. In vitro transcripts of the yeast fragments 
were purified, quantified and included in every labeling reaction 
at predetermined Cy3:Cy5 input levels (fragment 22, 123:4 pg; 
fragment 6, 123:123 pg; fragment 25, 4:123 pg). All probe 
labeling reactions were done in the presence of 200 ng poly(A) 
mRNA, from either human brain or heart (Biochain, Hayward, 
CA). Hybridization of these probes was performed on three 
different days, across 20 microarrays representing two 
different batches and by multiple operators. A comparison of 
the expected differential expression and the experimentally 
observed differential expression is shown in Figure 1. These 
results indicate that target DNA arrayed at input concentrations 



<1.0 ug/10 ul results in an underestimate or compression of the 
observed differential expression, with more compression 
occurring at lower DNA concentrations. 

Quantification of DNA amplimers on the array by a 
hybridization-independent method 

The DNA concentration of the input printing solutions may not 
be directly predictive of the amount of DNA actually retained 
on the glass. Variations in the transfer efficiency of individual 
DNA sequences to the glass and variations in its subsequent 
retention through the post-arraying and processing procedures 
may have an impact on the amount of DNA retained. Therefore, a 
second DNA staining assay was developed using SYTO 61 
fluorescent dye, which directly measured the amount of DNA 
actually retained on the glass, independent of hybridization. 

Qualification of 10 000 element cDNA microarrays 

Based on the preliminary experiments we applied the 
PicoGreen and SYTO 61 assays to evaluate two independent 
10 000 element microarrays (Fig. 2). Each of the 104 96-well 
plates used to print the arrays was qualified by PicoGreen 
analysis and all plate sets had high levels of PCR amplimer 
(>1.0 ug/well) (Fig. 2A). The plate sets used to prepare the 
HGG1 arrays, however, had a greater overall average DNA 
concentration than those used to prepare the UGV1 arrays: 
median 3.6 versus 1.85 ug/well, respectively. 

An array from each batch was hybridized with a complex 
cDNA probe derived from placenta RNA in both the Cy3 and 
Cy5 channels. SYTO 61 staining was performed on an additional 
array from each batch and a comparison of the signal outputs 
for SYTO 6 1 and hybridization probes for both array batches is 
shown in Figure 2B and C. Observed hybridization signals 
were generally higher for the HGG1 array (Fig. 2C) as 
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Figure 2. Quality control analysis of microarray batches. A set of eight wells randomly selected from each of 1 04 96-well plates from microarray types UGV 1 and 
HGGI was analyzed with PicoGreen. The distribution of DNA concentrations is shown in (A). The amount of hybridization .signal with a complex probe (Cy3 
Brain/( ,-5 Heart) is ihown as a function of the amount of DNA retained on the glass for microarray types UGV1 (B) and HGGI iC). Signal distributions from 
hybridizations with a vector-specific oligonucleotide probe for each array type are shown in (D). 



compared to the UGV1 array (Fig. 2B): median Cy3 1049 
versus 310 relative fluorescence units (r.f.u.), median Cy5 
1137 versus 302 r.f.u., respectively. This was consistent with 
the higher amount of DNA on the glass for the HGGI array: 
median 2532 versus 1905 r.f.u. Higher hybridization signals 
(>10 000 r.f.u.) were routinely observed when the amount of 
target DNA bound to the glass approached 2000 r.f.u. by 
SYTO 61 staining (data not shown). In the examples shown, 
35% of the elements on the UGV1 microarray have SYTO 61 
stain values <2000 r.f.u., as compared to only 9% of the 
elements on the HGGI array. There was an apparent discrepancy 
in the UGV1 microarray, 65% of all elements on the UGV1 
array having higher levels of bound DNA but few yielding 
hybridization signals >10 000 r.f.u.. 

To address this issue a third assay was developed. An array 
from each batch was hybridized with a Cy3-labeled oligo- 
nucleotide probe specific for the common vector sequence 
found in all the PCR products. The signal distribution for these 
vector hybridizations is presented in Figure 2D. The majority 
of elements on the UGV1 microarray had significantly lower 



hybridization signals than the HGGI array: median 1901 
versus 6507 r.f.u. These results correlated better with complex 
probe hybridization than SYTO 61 staining (Fig. 2B and C). 

The manufacture of high quality, reproducible arrays with 
10 000 or more unique PCR products is an expensive and time- 
consuming effort. It requires considerable attention to the 
details of each step in the process and defined procedures to 
ensure quality and reproducibility. The data presented in this 
report show that low concentrations of DNA in the input 
printing solutions result in reduced amounts of arrayed DNA 
and this, in turn, reduces the dynamic signal range and 
produces an apparent compression or underestimation of 
differential expression. The assay procedures reported here 
have been implemented in the large-scale production of micro- 
arrays for use in generating expression databases. 

mRNA input 

The impact of varying the amount of input mRNA on net 
cDNA probe synthesis and hybridization was evaluated. 
Placental mRNAs of varying amounts (25^100 ng) were 
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labeled with Cy3 and hybridized to an equal aliquot labeled with 
Cy5. Increasing the placental mRNA input yielded increasing 
amounts of total cDNA product (Fig. 3A). Hybridization signal- 
to-background and dynamic range also increased as the mRNA 
input increased, although a clear point of 'diminishing returns' 
occurs above 200 ng mRNA input (Fig. 3B and C). Based on 
this mRNA titration series, we believe that using 200 ng 
mRNA as the standard input for labeling reactions is the 
optimal amount. A representative example of a competitive 
hybridization with balanced RNA inputs (200:200 ng) is 
presented in Figure 4A. 

We tested the effect of unbalanced competitive hybridization 
by hybridizing product prepared from different input levels of 
placental mRNA in the labeling process (Fig. 3D and E). We 
observed significant loss in precision and a distortion of the 



population from the theoretical DE of 1, especially in the lower 
signal range. This distortion reflects both the impact of differ- 
ential labeling and hybridization of transcripts with different 
amounts of mRNA input. Reversing the ratio of input mRNA 
for probe synthesis resulted in the opposite curvature (Fig. 3E). 
We conclude that accurate quantification and use of an 
equivalent mRNA mass for labeling in both channels is essential 
for optimum results. 

Homotypic response 

An estimate of the accuracy and precision of array-generated 
expression data was first made by performing a series of 
replicate experiments using various homotypic hybridizations. 
A competitive hybridization of fluorescently labeled Cy3 and 
Cy5 cDNA, both prepared from the same placental mRNA, 
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Figure 4. (A) Scatter plot of the calibrated Cy3 versus Cy5 fluorescence 
response from a typical placenta:placenta hybridization. The diagonal line through 
the origin corresponds to the expected DE of I . The other diagonal lines define DE 
values as indicated next to the line. (B) Histogram showing the distribution of 
elements by log n of their experimentally derived DEs for 10 homotypic placental 



should theoretically give a DE (or Cy3 fluorescence divided by 
Cy5 fluorescence) of 1 for all 10 000 elements arrayed on the 
slide. With replicate hybridizations we can evaluate the overall 
precision of the data using various statistical parameters and 
obtain an estimate of accuracy from any deviation(s) observed 
from the theoretical value. 

A scatter plot of the Cy3- versus Cy5-calibrated fluorescent 
response from a single placenta:placenta hybridization is 
shown in Figure 4A. Virtually all gene elements lie close to the 
diagonal line corresponding to the expected DE of 1. Overall 
system response was observed to be linear over about three 
orders of magnitude. 

Approximately 100 000 data points from 10 homotypic 
placenta hybridizations were used to construct a histogram showing 
the frequency or distribution of gene elements (as a percentage of 
the total) around log n of the expected DE (In 1.0 = 0). Effectively, 
the histogram (Fig. 4B) is a graphical measure of the range of 
the signal response for each selected element. The coefficient 



of variation (CV), or relative standard deviation, provides a 
quantitative estimate of the precision of differential expression. 
The calculated CV for differential expression for all elements 
was 12% across the entire signal range. The same 12% variance 
was observed across two independently manufactured batches 
of cDNA microarrays (data not shown). 

Ten similar homotypic hybridization experiments were 
conducted with both human brain and heart samples and the 
data were compared to the placenta results described above. 
Results for both sets of hybridizations were identical (data not 
shown). The same 12% CV for differential expression was 
observed independent of tissue type over the entire signal 
range. 

Accuracy, in terms of bias, was estimated by calculating an 
average experimental DE directly from observed fluorescence 
output and comparing it to the expected value of 1 .00. For each 
of the three (issue types above (placenta, brain and heart) (he 
average (n = 10) experimental DE values were 0.999983, 
0.99977 and 0.9998, respectively. The overall average was 
0.9999, or less than 1 part in 10 000. These values are in good 
agreement not only within the group, but also with the 
expected theoretical value of 1.00. 

The observed variation in individual element responses 
(from the expected DE = 0) for 1 80 randomly selected genes 
across the full range of observed signal response (as a function 
of Cy5 signal) is shown in Figure 5A-C for placenta, brain and 
heart tissue. For each of the 180 elements selected all 10 
replicate data points are plotted for each gene from each tissue 
type. Regardless of tissue type we observed few data points 
with a differential expression greater than 2, even at low 
overall signal levels. 

From the above data we can calculate the change in DE 
required before the value has statistical significance. Mathe- 
matically this can be written in terms of the two-sided statistical 
tolerance interval for the differential expression of non- 
differentiated elements (16). A statistical tolerance interval is 
one that contains a specified portion, p, of the entire sampled 
population with a specified degree of confidence, 100(1 -q)%. 
Table 1 shows the 99.5% tolerance intervals for 99% of the 
elements from each tissue type: all observed differential 
expression values fall between ±1.4. 

Analysis of variance (ANOVA) was used to estimate the 
contribution of specific potential sources of variance to the 
overall variance measured. Analyses were performed using the 
method of restricted maximum likelihood under SAS for 
Windows v.6.12 procedure PROC MIXED (17). All of the 
homotypic placenta, brain and heart data sets were used for this 
analysis. 

There are four general sources of variation in the DE ratios: 
microarray batch, array-to-array hybridization variance 
(including sample preparation), biological source tissue and 
gene sequence variance. Table 2 lists the estimated contribution of 
these potential sources of variation to the overall variance 
measured. The two sources contributing most significantly to 
the overall variation were hybridization variance and sequence 
variance. Hybridization variance represents a source of variation 
from hybridization to hybridization. Sequence variance indicates 
that different elements demonstrate different levels of variation. 
Microarray batches and source tissues were not significant 
sources of variance. 
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Figure 5. Variation in individual element responses for 180 randomly selected 
genes over the full range of observed signal response (expressed as log Cy5 
signal). All II) replicate dam points for each selected element are plotted along 
the vertical axis. Horizontal lines define the tolerance interval outside of v, Inch 
DB was deemed significant (see text). (A) Homotypic placental hybridizations, 
(B) Homotypic brain hybridizations. (C) Homotypic heart hybridizations. 



Differential expression 

Using placental mRNA as a common reference, four sets of 
experimental conditions to measure differential expression 
were evaluated. Each set contained 10 replicate hybridizations: 
brain :placenta, placenta:brain, heartplacenta and placenta: heart. 
Estimates of system precision and detection limits were made 
as described above for the homotypic hybridizations. 
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Figure 6. Scatter plot of Cy3-labeled cDNA from heart U-axis) hybridized to 
the array with Cy5-labeled cDNA from placenta ( v-axis) (single experiment). 
Compare with Figure 4A. 



Figure 6 shows the fluorescence response plot of a single 
representative experiment conducted with Cy3-labeled cDNA 
from heart competitively hybridized to the array with Cy5-labeled 
cDNA prepared from placenta. Most of the elements (>90%) 
fell on or close to the 45° line representing no differential 
expression (or DE = 1.00). However, in contrast to the homotypic 
hybridizations (Fig. 4 A), 10% of the elements were also 
observed to fall outside the tolerance interval, which may 
indicate significant differential expression (Table 1). 

From 10 such replicate experiments in this set we calculated 
a CV for each of the 10 000 elements and plotted the values 
against the overall dynamic signal range (as a function of log 
Cy5 fluorescence signal) as shown in Figure 7A. The average 
CV was observed to be 10-12% across the entire signal range, 
although there was slightly greater variation at low signal 
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Figure 7. (A) CV for each of 10 000 elements derived from 10 replicate 
heart:placcnta +1 s hri.lizat i. >ns pinned as a function of the average observed 
signal (as Cy5 signal). (B) CV for the same 10 000 elements plotted as a function 
of log„ of the average observed DE (ln DE). 



Figure 8. Reciprocal labeling experiments showing the data plotted from 
180 random elements from (A) II) replicate brain-placenta (Mack symbols) and 
10 replicate placenlatbrain (blue symbols) hybridizations versus log DE, and 
(B) 10 replicate hearttplacenta (black) and 10 replicate placentatheart (blue) 



levels. Figure 7B shows the CV for the same 10 000 elements 
above plotted as a function of average DE. Most elements are 
observed to cluster near the value 0, indicating no differential 
expression. However, the CV of 12% observed for non- 
differentiated elements, on average, was slightly smaller than 
the CV for differentiated elements in either direction. The 
observed average CV ranged from 12% for non-differentiated 
elements to a maximum value of 25% for elements differentially 
expressed by a factor of 100. Since the DE is a ratio of the 
signals from the two channels, variations in the denominator at 
lower signal levels have a larger impact. Despite these minor 
differences, overall system precision remains excellent. 

The same 180 random elements in Figure 5 were evaluated 
in 'reciprocal dye labeling' experiments. Theoretically, the 
Cy3- and Cy5-labeled primers should function equivalently for 
cDNA synthesis. However, any differences in incorporation of 
label would, if significant, identify differential expression 
where none exists. It could also account for some of the 
variation we observe in the different parameters evaluated in 
this study. Therefore, we performed a series of additional 
experiments specifically designed to address this issue. 

The data from 10 replicates of the brainrplacenta hybridizations 
were compared to the data from 10 replicates of the reciprocally 
labeled placenta:brain hybridizations. Figure 5A shows a plot 



of the DE for 1 80 random elements from both sets of data. The 
DE for any given element in the first set of hybridizations 
should simply be the reciprocal of the DE for the same element 
in the second set (when the labeling is reversed). As Figure 8A 
shows, the cluster of 10 data points for each element from set 1 
lies the same distance above the horizontal line through log 10 
1.0 = 0 as the corresponding cluster from set 2 lies below it. 
Figure 8B shows a similar plot generated from 20 microarrays, 
where 10 hearCplacenta hybridizations were compared to the 
reciprocally labeled placenta:heart hybridizations, with essentially 
equivalent results. 

For each element we can define the axial symmetry of reflection 
(ASR) as the inflection point between the DEs from the 
reciprocal labeling experiments, calculated by averaging the 
two DE ratios. Calculated average ASR values of 0.998 and 
0.999 were obtained from the placenta:brain and placenta:heart 
data sets, respectively, in good agreement with the theoretical 
value of 1.00. Thus any systematic bias introduced into the DE 
by reciprocal labeling must be less than 1-2 parts in 1000. 
These results independently verify the precision in measuring 
differential expression, as well as in identifying those genes 
that arc not differentially expressed. Histograms showing the 
distribution of all elements (as a percentage of the total) as a 
function of ln ASR (Fig. 9 A and B) were similar to the histogram 
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Figure 9. Histograms showing the distribution of all elements as a function of 
in ASR from reciprocal labeling experiments. (A) Data for brain:placenta and 
placenurhrain hybridizations. (B) Date for heart:placenta and placenta:heart 
hybridizations. 

observed for non-differentiated elements (Fig. 4B). They also 
had the same standard deviation. Therefore, any variation 
observed in DE was likely a result of real variations in experi- 
mental mRNA levels, rather than an artifact of the labeling 
system. 

A series of independent yeast standards was also included on 
each microarray to assist in evaluating overall system perform- 
ance. These controls demonstrated linearity in overall signal 
response over three orders of magnitude, a CV of 12% and a 
limit of detection of 2 pg mRNA at a signal-to-background 
ratio of 2.5 (data not shown). 

CONCLUSION 

In this report we have described measures important in the 
manufacture of cDNA microarrays and in the preparation and 
labeling of mRNAs for use in a two-channel hybridization 
system. Furthermore, the results presented in this report 
demonstrate in a quantitative fashion the performance of the 
cDNA microarray technology platform. The usefulness of any 
expression database is ultimately dependent on the quality of 
the underlying data used to construct it. We report that the 
cDNA microarray platform does provide the high quality data 
needed to establish reliable gene expression databases. 



The analytical methods used to evaluate the performance of 
the cDNA microarray platform described in this report provide 
a practical framework for evaluating the performance of other 
technologies that purport to measure global mRNA expression. 
Only by disclosing the performance characteristics in a 
rigorous manner can researchers gauge the utility of any data 
produced by other platforms. 
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Gene Expression Profile of 
Aging and Its Retardation by 
Caloric Restriction 

Cheol-Koo Lee, 13 Roger G. Klopp, 2 
Richard Weindruch, 4 * Tomas A. Prolla 3 * 

The gene expression profile of the aging process was analyzed in skeletal muscle 
of mice. Use of high-density oligonucleotide arrays representing 6347 genes 
revealed that aging resulted in a differential gene expression pattern indicative 
of a marked stress response and lower expression of metabolic and biosynthetic 
genes. Most alterations were either completely or partially prevented by caloric 
restriction, the only intervention known to retard aging in mammals, Tran- 
scriptional patterns of calorie-restricted animals suggest that caloric restriction 
retards the aging process by causing a metabolic shift toward increased protein 
turnover and decreased macromolecular damage. 



Most multicellular organisms exhibit a pro- 
gressive and irreversible physiological de- 
cline that characterizes senescence, the mo- 
lecular basis of which remains unknown. 
Postulated mechanisms include cumulative 
damage to DNA leading to genomic instabil- 
ity, epigenetic alterations that lead to altered 
gene expression patterns, telomere shortening 
in replicative cells, oxidative damage to crit- 
ical macromolecules by reactive oxygen spe- 
cies (ROS), and nonenzymatic glycation of 
long-lived proteins (1, 2). 

Genetic manipulation of the aging process 
in multicellular organisms has been achieved 
in Drosopliila through the overexpression of 
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catalase and Cu/Zn superoxide dismutase (3), 
in the nematode Caenorhabditis elegans 
through alterations in the insulin receptor sig- 
naling pathway (4), and through the selection 
of stress-resistant mutants in either organism 
(5). In mammals, mutations in the Werner 
Syndrome locus (WRN) accelerate the onset 
of a subset of aging-related pathology in hu- 
mans y6), but the only intervention that ap- 
pears to slow the intrinsic rate of aging is 
caloric re ■friction (CR) (7). Most studies 
have involved laboratory rodents which, 
when subjected to a long-term, 25 to 50% 
reduction in calorie intake without essential 
nutrient deficiency, display delayed onset of 
age-associated pathological and physiologi- 
cal changes and extension of maximum life- 
span. Postulated mechanisms of action in- 
clude increased DNA repair capacity, altered 
gene expression, depressed metabolic rate, 
and reduced oxidative stress (7). 

To examine the molecular events associ- 
ated with aging in mammals, we used oligo- 
nucleotide-based arrays to define the tran- 
scriptional response to the aging process in 
mouse gastrocnemius muscle. Our choice of 
tissue was guided by the fact that skeletal 
muscle is primarily composed of long-lived, 
high oxygen-consuming postmitotic cells, a 



feature shared with other critical aging targets 
such as heart and brain. Loss of muscle mass 
(sarcopenia) and associated motor dysfunc- 
tion is a leading cause of frailty and disability 
in the elderly (8). At the histological level, 
aging of gastrocnemius muscle in mice is 
characterized by muscle cell atrophy, varia- 
tions in size of muscle fibers, presence of 
lipofuscin deposits, collagen deposition, and 
mitochondrial abnormalities (°). 

A comparison of gastrocnemius muscle 
from 5-month (adult) and 30-month (old) 
mice (10-12) revealed that aging is associat- 
ed with alterations in mRNA levels, which 
may reflect changes in gene expression, 
mRNA stability, or both. Of the 6347 genes 
surveyed in the oligonucleotide microarray, 
only 58 (0.9%) displayed a greater than two- 
fold increase in expression levels as a func- 
tion of age, whereas 55 (0.9%) displayed a 
greater than twofold decrease in expression. 
These findings are in agreement with a dif- 
ferential display analysis of gene expression 
in tissues of aging mice (13). Thus, the aging 
process is unlikely to be due to large, wide- 
spread alterations in gene expression. 

Functional classes were assigned to genes 
displaying the largest alterations in expres- 
sion (Table 1). Of the 58 genes that increased 
in expression with age, 16% were mediators 
of stress responses, including the heat shock 
factors Hsp71 and Hsp27, protease Do, and 
the DNA damage-inducible gene GADD45 
(14). The largest differential expression be- 
tween adult and aged animals (a 3.8-fold 
induction) was observed for the gene encond- 
ing the mitochondrial sarcomeric creatine ki- 
nase, a critical target for ROS-induced inac- 
tivation (15). 

A consequence of skeletal muscle aging is 
loss of motor neurons followed by reinnerva- 
tion of muscle fibers by the remaining intact 
neuronal units (16). Genes involved in neuronal 
growth accounted for 9% of genes highly in- 
duced in 30-month-old animals, including neu- 
rotrophin-3 (77), a growth factor induced dur- 
ing reinnervation, and synaptic vesicle protein- 
2, implicated in neurite extension (18). PEA3, a 
transcriptional factor induced in the response to 
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muscle injury and previously shown to be high- 
ly expressed in muscle from old rats (19), was 
also induced in aged muscle. We also observed 
parallels between our results and data from 
fibroblasts undergoing in vitro implicative se- 
nescence. For example, HIC-5, a transcriptional 
factor induced by oxidative damage, and insu- 
lin-like growth factor binding protein, both as- 
sociated with in vitro senescence (20), are in- 
duced in aged skeletal muscle. 

Fifty-five (0.9%) genes displayed a greater 
than twofold age-related decrease in expres- 
sion. Genes involved in energy metabolism ac- 
counted for 13% of these alterations (Table 1). 
These include alterations in genes associated 
with mitochondrial function and turnover, such 
as the adenosine 5 '-triphosphate (ATP) syn- 
thase A chain and nicotinamide adenine dinu- 
cleotide phosphate (NADP) transhydrogenase 
genes (both involved in mitochondrial bioener- 
getics), the LON protease implicated in mito- 
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chondrial biogenesis, and the ERV1 gene in- 
volved in mitochondrial DNA (mtDNA) main- 
tenance (21). Additionally, a decrease in meta- 
bolic activity is suggested through a decline hi 
the expression of genes involved in glycolysis, 
glycogen metabolism, and the glycerophos- 
phate shunt (Table 1). 

Aging was also characterized by large re- 
ductions (twofold or more) in the expression of 
biosynthetic enzymes such as squalene syn- 
thase (fatty acid and cholesterol synthesis), 
stearoyl-coenzyme A (CoA) desaturase (poly- 
unsaturated fatty acid syndiesis), and EF-1- 
gamma (protein synthesis). This suppression 
was accompanied by a concerted decrease in 
the expression of genes involved in protein 
turnover, such as the 205' proteasome sub-unit, 
the 265 proteasome component TBP1, ubiq- 
uitin-thiolesterase, and the Unp ubiquitin-spe- 
cific protease, all of which are involved in the 
ubiquitin-proteasome pathway of protein turn- 



over (22). The directions of changes in other 
functional categories, such as signal transduc- 
tion, and transcriptional and growth factors, did 
not present a consistent age-related trend. 

In order to study the effects of CR on the 
gene expression profile of aging, we reduced 
caloric intake of C57BL/6 mice to 76% of 
that fed to control animals in early adulthood 
(2 months of age), and this dietary regimen 
was maintained until animals were killed at 
30 months, A comparison of 30-month-old 
control and CR mice revealed that aging- 
related changes in gene expression profiles 
were remarkably attenuated by CR. Of the 
largest age-associated alterations (twofold or 
higher), 29% were completely prevented by 
CR and 34% were partially suppressed (Table 
1). Of the four major gene classes that dis- 
played consistent age-associated alterations, 
84% were either completely or partially sup- 
pressed by CR. Thus, at the molecular level, 



Table 1 (left). Aging-related changes in gene expression ir 
muscle. The extent to which caloric restriction prevented age-associated alter- 
ations in gene expression is denoted as either C (complete, >90%), N (none), or 
partial (20 to 90%, percentage effect Indicated). The fold increase shown repre- 
sents the average of all nine possible pairwlse comparisons among Individual 
mice determined by means of a specific algorithm (72). GenBank accession 
numbers are listed under ORF. A more comprehensive list that includes genes 
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fold) 



> Heat Shock 27 kDa Protaln 

3.5 Serum Amyloid A 

T 3.4 Heal Shock 71 kD 

T2.6 GAD045 

1 2.4 Aldehyde Dehydrogenase II 



X78197 T2.2 AP-2Bela 

X89749 T2.1 mTGIF 

AA014024 1 2.1 Dynaclln 

X63190 T 2.1 PEA3 
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Glycogen metabolism 
Peroxisome assembly 
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7 Phox2 Homeodomaln Pre 
♦ Calpactln I Light Chain 
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that did not fit into the six classes can be found at www1.genetics.wlsc.edu/ 
prolia/Protla_Tables.html. Table 2 (right). Caloric restriction-induced alter- 
ations In gene expression. The data represent a comparison between 30-month- 
old CR-fed and control-fed mice. The gene expression alterations listed In this 
Table are diet related and do not Include those representing prevention of 
age-associated changes (see Table 1). Additional CR-induced changes are posted 
at the aforementioned Web site. 

Function 
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1 Signal Recognition Receptor Alpha Subunll Proleln synthesis 
1 Proteasome Activator PA28 Alpha Subunit Protein turnover 
0 mCyP-Sl (Cyclophllln) Protein folding 

9 Translocon-Assoclated Protein Delta Protein irniislocolioi 

8 60S Rlbosomal Protein L23 proloin synthesis 

7 Fatty Acid Synthase Fatty acid synthesis 

5 Glutamlne Synthetase Glulamlne synthesis 

4 Cytochrome P450-IIC1 2 Steroid biosynthesis 
0 Thymldylate Kinase dTTP synthesis 



1.7 DNA Polymerase Beta 
1.6 XPE 
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4 1.6 Thyroid Hormone Receptor Alpha-2 
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CR mice appear to be biologically younger 
than animals receiving the control diet. 

Caloric reshiction induced a metabolic 
reprogramming characterized by a transcrip- 
tional shift toward energy metabolism, in- 
creased biosynthesis, and protein turnover 
(Table 2). CR resulted in the induction of 51 
genes (1.8-fold or higher) as compared with 
age-matched animals consuming the control 
diet. Nineteen percent of genes in this class 
are related to energy metabolism. Modulation 
of energy metabolism was evident through 
the induction of glucose-6-phosphate isomer- 
ase (glycolysis), fructose 1,6-bisphosphatase 
(gluconeogenesis), IPP-2 (an inhibitor of gly- 
cogen synthesis), and transketolase. Fructose 
1,6-bisphosphatase switches the direction of 
a key regulatory step in glycolysis toward a 
biosynthetic precursor, glucose-6-phosphate. 
Remarkably, this same adaptation has been 
observed as part of the transcriptional repro- 
gramming of Sacchavomyces cerevisiae ac- 
companying the diauxic switch from anaero- 
bic growth to aerobic respiration upon deple- 
tion of glucose (23). Transketolase, which 
controls the nonoxidative branch of the pen- 
tose phosphate pathway, provides NADPH 
for biosynthesis and reducing power for sev- 
eral antioxidant systems. CR also induced 
transcripts associated with fatty acid metab- 
olism, such as fatty acid synthase and PPAR- 
delta, a mediator of peroxisome proliferation. 
Interestingly, CR may act to increase insulin 
sensitivity through the induction of glucose- 
dependent insulinotropic peptide and PPAR- 
gamma, a potent insulin sensitizer (24). 

Biosynthetic ability also appears to be 
induced in CR mice. CR up-regulated the 
expression of glutamiue synthase, purine nu- 
cleoside phosphorylase (purine turnover), 
and thymidylate kinase (dTTP synthesis). Re- 
markably, 16% of transcripts highly induced 
by CR encode proteins involved in protein 
synthesis and turnover, including elongation 
factor 1-gamma, proteasome activator PA28, 
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translocon-associated protein delta, 60S ribo- 
somal protein L23, and the 26S proteasome 
subunit TBP-1. 

CR was associated with a 1 ,6-fold or greater 
reduction in expression of 57 genes. Of these, 
12% were associated with stress responses or 
DNA repair pathways, or both (Table 2). 
Among the 6347 genes examined, the most 
substantial suppression of gene expression by 
CR was for a murine DnaJ homolog (3.4-fold), 
a pivotal and inducible heat shock factor that 
senses and transduces the presence of mis- 
folded or damaged proteins in bacteria (25). CR 
also lowered the expression of cytochrome 
P450 isofonns IHA and Cyplbl (involved in 
detoxification), Hspl05 (a heat shock factor), 
aldehyde dehydrogenase (an inducible enzyme 
involved in detoxification of metabolic by- 
products), and an oxidative stress-induced pro- 
tein of unknown function. CR reduced the ex- 
pression of several DNA repair genes including 
XPE (a factor that recognizes multiple DNA 
adducts), RAD50 (involved in double-strand 
break repair), and DNA polymerase- beta (a 
DNA damage-inducible polymerase). We also 
find molecular evidence to support a state of 
lower basal metabolic rate in CR mice through 
lowered expression of the thyroid-hormone re- 
ceptor alpha gene (26). 

The data presented here provide the first 
global assessment of the aging process in 
mammals at the molecular level and under- 
score the utility of large-scale, parallel gene 
expression analysis in the study of complex 
biological phenomena. We estimate that the 
6347 genes analyzed in this study represent 5 
to 10% of the mouse genome. Additional 
classes of aging-related genes in skeletal 
muscle may be discovered with the develop- 
ment of higher density mammalian DNA mi- 
croarrays. The observed collection of gene ex- 
pression alterations in aging skeletal muscle is 
complex, reflecting the presence of myocyte, 
neuronal, and vascular components. Although 
some of the age-associated alterations in gene 



expression could represent maturational chang- 
es, this possibility is unlikely given the fact that 
tire 5-month-old (adult) mice used in this study 
were fully mature animals. Importantly, chang- 
es in mRNA levels may not always result in a 
parallel alteration in protein levels. However, 
the complete or partial prevention of most age- 
related alterations by CR suggests that gene 
expression profiles can be used to assess the 
biological age of mammalian tissues, providing 
a tool for evaluating experimental interventions. 

Taken as a whole, our results provide evi- 
dence that during aging there is an induction of 
a stress response as a result of damaged proteins 
and other macromolecules. This response en- 
sues as the systems required for the turnover of 
such molecules decline, perhaps as a result of 
an energetic deficit in the cell. In particular, the 
observed alterations in transcripts associated 
with energy metabolism and mitochondrial 
function may reflect either decreased mitochon- 
drial biogenesis or turnover secondary to cumu- 
lative ROS-inflicted mitochondrial damage (2), 
lending support to the concept that mitochon- 
drial dysfunction plays a central role in aging of 
postmitotic tissues. The gene expression profile 
also suggests that secondary responses to the 
aging process in skeletal muscle involve the 
activation of neuronal and myogenic responses 
to injury, 

A summary of global changes induced by 
aging, and the contrasting effects of CR, are 
shown in Table 3. The transcriptional activation 
of stress response genes that process damaged 
or misfolded proteins during aging, and the 
prevention of this induction by CR, suggest a 
central role for protein modifications in aging. 
Indeed, aging is characterized by an exponen- 
tial increase of oxidatively damaged proteins 
(27). Previous analyses of metabolic rates in 
CR animals have led to the suggestion that this 
life-extending regimen acts through a reduction 
in metabolic rate, resulting in a lower produc- 
tion of toxic by-products of metabolism (28). 
The CR-mediated reduction of mRNAs encod- 
ing inducible genes involved in metabolic de- 
toxification, DNA repair, and the response to 
oxidative stress supports this view, because it 
implies lower substrate availability for these 
systems. Additionally, our analysis indicates 
that CR may cause a metabolic shift toward 
increased biosynthesis and macromolecular 
turnover. A hormonal nigger for this shift may 
be an alteration in the insulin signaling pathway 
through increased expression of genes that me- 
diate insulin sensitivity, a finding that links our 
observations to those obtained through the ge- 
netic analysis of aging in the nematode C. 
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Selenium is essential for male fertility in ro- 
dents and has also been implicated in the fer- 
tilization capacity of spermatozoa of livestock 
and humans (/). Selenium deficiency is associ- 
ated with impaired sperm motility, structural 
alterations of the midpiece, and loss of flagel- 
lum (1). However, three decades after the dis- 
covery of selenium as an integral constituent of 
redox enzymes (2), the molecular basis of the 
relationship of the essential trace element and 
male fertility remains obscure. The selenopro- 
tein PHGPx (Enzyme Commission number 
1,11.1.12) is abundantly expressed in sperma- 
tids and displays high activity in postpubertal 
testis (3). In mature spermatozoa, however, se- 
lenium is largely restricted to the mitochondrial 
capsule, a keratin-like matrix that embeds the 
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helix of mitochondria in the sperm midpiece 
(4). A "sperm mitochondria-associated cys- 
teine-rich protein (SMCP)" (5) had been con- 
sidered to be the selenoprotein accounting for 
the selenium content of the mitochondrial cap- 
sule (4-6). The rat SMCP gene, however, does 
not contain an in-frame TGA codon (7) that 
would enable a selenocysteine incorporation 
(8). In mice, the three in-frame TGA codons of 
the SMCP gene are upstream of the translation 
start (5). SMCP can therefore no longer be 
considered as a selenoprotein. Instead, the "mi- 
tochondrial capsule selenoprotein (MCS)," as 
SMCP was originally referred to (4-7), is here 
identified as PHGPx. 

Routine preparations of rat sperm mito- 
chondrial capsules (9) yielded a fraction that 
was insoluble in 1% SDS containing 0.2 mM 
dithiothreitol (DTT) and displayed a vesicu- 
lar appearance in electron microscopy (Fig. 
1A). The vesicles readily disintegrated upon 
exposure to 0.1 M mercaptoethanol (Fig. IB) 
and became fully soluble in 6 M guanidine- 
HC1. When the solubilized capsule material 
was subjected to polyacrylamide gel electro- 
phoresis (PAGE), four bands in the 20-kD 
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The selenoprotein phospholipid hydroperoxide glutathione peroxidase (PHGPx) 
changes its physical characteristics and biological functions during sperm mat- 
uration, PHGPx exists as a soluble peroxidase in spermatids but persists in 
mature spermatozoa as an enzymatically inactive, oxidatively cross-linked, 
insoluble protein. In the midpiece of mature spermatozoa, PHGPx protein 
represents at least 50 percent of the capsule material that embeds the helix of 
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mechanical instability of the mitochondrial midpiece that is observed in sele- 
nium deficiency. 
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Abstract 

Background: Ras is an area of intensive biochemical and genetic studies and characterizing downstream 
components that relay ras-induced signals is clearly important. We used a systematic approach, based on 
DNA microarray technology to establish a first catalog of genes whose expression is altered by ras and, 
as such, potentially involved in the regulation of cell growth and transformation. 

Results: We used DNA microarrays to analyze gene expression profiles of ras vl2 /EIA-transformed 
mouse embryonic fibroblasts. Among the -12,000 genes and ESTs analyzed, 815 showed altered 
expression in ras vl2 /EI A-transformed fibroblasts, compared to control fibroblasts, of which 203 
corresponded to ESTs. Among known genes, 202 were up-regulated and 4 1 0 were down-regulated. About 
one half of genes encoding transcription factors, signaling proteins, membrane proteins, channels or 
apoptosis-related proteins was up-regulated whereas the other half was down-regulated. Interestingly, 
most of the genes encoding structural proteins, secretory proteins, receptors, extracellular matrix 
components, and cytosolic proteins were down-regulated whereas genes encoding DNA-associated 
proteins (involved in DNA replication and reparation) and cell growth-related proteins were up-regulated. 
These data may explain, at least in part, the behavior of transformed cells in that down-regulation of 
structural proteins, extracellular matrix components, secretory proteins and receptors is consistent with 
reversion of the phenotype of transformed cells towards a less differentiated phenotype, and up-regulation 
of cell growth-related proteins and DNA-associated proteins is consistent with their accelerated growth. 
Yet, we also found very unexpected results. For example, proteases and inhibitors of proteases as well as 
all 8 angiogenic factors present on the array were down-regulated in transformed fibroblasts although they 
are generally up-regulated in cancers. This observation suggests that, in human cancers, proteases, 
protease inhibitors and angiogenic factors could be regulated through a mechanism disconnected from ras 
activation. 

Conclusions: This study established a first catalog of genes whose expression is altered upon fibroblast 
transformation by ras vl2 /EIA. This catalog is representative of the genome but not exhaustive, because 
only one third of expressed genes was examined. In addition, contribution to ras signaling of post- 
transcriptional and post-translational modifications was not addressed. Yet, the information gathered 
should be quite useful to future investigations on the molecular mechanisms of oncogenic transformation. 
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Background 

Cancer is a disease caused by multiple genetic alteradons 
that lead to uncontrolled cell proliferadon. This process 
often involves activation of cellular proto-oncogenes and 
inactivation of tumour-suppressor genes. One of the ear- 
liest and most potent oncogenes identified in human can- 
cer is the mutant ras [1,2]. Ras family of proto-oncogenes 
encodes small GTP-binding proteins that transduce mi- 
togenic signals from tyrosine-kinase receptors [reviewed 
in [3]]. In vitro, oncogenic ras efficiendy transforms most 
immortalized rodent cell lines but fails to transform 
mouse primary cells [4], However, ras can transform pri- 
mary mouse cells by cooperating with other oncogenic al- 
terations such as overexpression of c-Myc, dominant 
negative p53, D-type cyclins, Cdc25A and Cdc25B, or loss 
of p53, pi 6 or IRF-l [5-7], Several viral onco-proteins can 
also cooperate with ras, for example SV40 T-antigen, ade- 
novirus El A, human papillomavirus E7 and HTLV-1 Tax 
[reviewed in [6,7]]. When expressed alone in primary 
cells, most of these alterations facilitate their immortaliza- 
tion [7]. Oncogenic transformation of primary cells by co- 
expression of ras and immortalizing mutations constitutes 
a model of multistep tumorigenesis that has been repro- 
duced in animal systems [reviewed in [8,9]]. 

Ras has been an area of intensive biochemical and genetic 
studies [10]. These studies helped to characterize down- 
stream signaling events and components that relay ras-in- 
duced mitogenic signals to the ultimate transcription 
factors which regulate expression of genes involved in cell 
growth and transformation. Downstream signaling elicit- 
ed by the oncogenic form of Ras protein impairs regula- 
tion of gene expression with eventual disruption of 
normal cellular functions. Downstream transcription fac- 
tors were found essential for ras-mediated cell ttansforma- 
tion [11-13]. However, compared with our knowledge on 
ras signaling events, little is known on target genes in- 
volved in the phenotypic changes resulting from ras acti- 
vation, such as cell transformation. Thus, identification of 
genes whose expression is altered during ras-mediated cell 
transformation would provide important information on 
the underlying molecular mechanism. In the present 
work, we used DNA microarray technology to analyze 
gene expression profiles of ras vl2 /ElA-transformed pri- 
mary mouse embryonic fibroblasts (MEFs), in order to 
identify genes whose expression is transformation-de- 
pendent. 

Results 

Analysis of gene express/on changes after ras v,2 IEIA- 
transformation 

We used microarray analysis to compare expression pro- 
files of -vl2,000 genes in normal vs. ras vl2 /ElA-trans- 
formed fibroblasts. Figure 1 shows the phenotypic 
changes of the ras vl2 /ElA-transformed MEFs. With Af- 



fymetrix microarray technology, differential expression 
values greater than 1.7 are likely to be significant, based 
on internal quality control data. We present data which 
use a more stringent ratio, restricting our analysis to genes 
that are overexpressed or under-expressed at least 2.0 fold 
in ras vl2 /El A-transformed fibroblasts relative to the emp- 
ty retrovirus-transduced MEFs. We summarize the high- 
lights below and present the full profile in Figure 2. 

Among the ~ 12,000 genes and ESTs analyzed, expression 
of 815 showed to be altered by at least 2.0 fold in the 
ras vl2 /ElA-transformed fibroblasts, of which 203 corre- 
sponded to ESTs. Among known genes, 202 were up-reg- 
ulated (Table l)(see Additional file 1) whereas 410 were 
down-regulated (Table 2)(see Additional file 2) by ras V12 / 
ElA-transformation. It is interesting to note that about 
one half of genes encoding Uanscription factors, signaling 
proteins, membrane proteins, channels, or apoptosis-re- 
lated proteins was up-regulated whereas the other half 
was down-regulated (Figure 2). However, after ras V12 / 
ElA-transformation most of genes encoding structural 
proteins, secretory proteins, receptors, proteases, protease 
inhibitors, extracellular matrix components, proteins in- 
volved in angiogenesis and cytosolic proteins, were down- 
regulated whereas genes encoding DNA-associated pro- 
teins (involved in DNA replication and reparation) and 
cell growth-related proteins were up-regulated (Figure 2). 
These data may explain, at least in part, the behavior of 
transformed cells. For example, down-regulation of struc- 
tural proteins, extracellular matrix components, secretory 
proteins and receptors is consistent with reversion of the 
phenotype of transformed cells towards a less differentiat- 
ed phenotype and up-regulation of cell growth-related 
proteins and DNA-associated proteins is consistent with 
their accelerated growth. 

Transcription factors 

57 genes encoding transcription factors were up-regulated 
and 45 down-regulated by ras vl2 /ElA-transformation. 
The most strongly activated genes corresponded to the 
homeobox protein SPX1 (39 fold), myb proto-oncogene 
(25 fold) and the paired-like homeodomain transcription 
factor (19 fold), whereas the most repressed were the os- 
teoblast specific factor 2 (123 fold), the p8 protein (51 
fold), the H 19 mRNA (21 fold) and the early B-cell factor 
(20 fold). 

Structural proteins 

Expressions of 10 genes encoding structural proteins were 
up-regulated in MEFs-transformed cells, 44 being down- 
regulated. The most important up-regulation was ob- 
served for cytokeratin (26 fold) and desmoplakin 1(17 
fold), the strongest down-regulations for smooth muscle 
calponin (115 fold), transgelin (49 fold), debrin (41 
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Figure I 

A. Expression of RAS was verified by immunoblot analysis in MEFs transduced with pBabe (control) or pBabe-ras vl2 /EI A 
(transformed) retroviruses. B. Morphological aspect of the pBabe and pBabe-ras vl2 /EI A transduced mouse embryonic fibrob- 
lats. C. Anchorage-independent growth of the ras vl2 /EIA transformed MEF. Fifty thousand cells were plated on 0.6% agar in 
DMEM-I0% FCS and overlaid on 0.6% agar in the same medium. Photomicrographs were taken 1 0 days after plating. D. ras vl2 / 
EIA transformed MEF induce tumor formation. One million of pBabe and pBabe-ras vl2 /EI A transduced mouse embryonic 
fibroblast were injected in 200 uj PBS as xenografts in nude mice. Representative mice at day 1 8. 
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Gene expression changes after ras vl2 /EI A-transformation. 
Number of genes up-regulated or down-regulated were 
grouped by function (Transcription factors, structural pro- 
teins, signaling, secretory proteins, receptors, protein syn- 
thesis, proteases, protease inhibitors, membrane proteins, 
extracellular matrix, enzymes, DNA-associated proteins, 
cytosolic proteins, channels, cell growth-associated proteins, 
angiogenesis, apoptosis and unknown function). Bars repre- 
sent the number of genes in each group. 



fold), p50b (35 fold) and vascular smooth muscle alpha- 
actin (34 fold). 

Signaling factors 

36 genes encoding proteins involved in numerous signal- 
ing pathways were up-regulated and 79 down-regulated in 
ras vl2 /ElA-transformed MEFs. The EGP314 precursor (al- 
so known as the calcium signal transducer 1) was found 
25 fold up-regulated, whereas the cysteine rich intestinal 
protein (41 fold) and ASM-like phosphodiesterase 3a (31 
fold) were the most strongly down-regulated genes. 

Secretory proteins 

Only one gene, encoding the transforming growth factor 
alpha, was detected as up-regulated (3 fold) in trans- 
formed cells. By contrast, expressions of 54 secretory pro- 



teins were repressed after ras vl2 /ElA-transformation. The 
most affected genes were those encoding cholecystokinin 
(112 fold), serum amyloid A3 (85 fold), PRDC (58 fold), 
insulin-like growth factor binding protein 5 (41 fold), 
gremlin (36 fold), follistatin (33 fold), the small induci- 
ble cytokine subfamily B (27 fold), cytokine SDF-l-beta 
(23 fold) and the small inducible cytokine A7 (22 fold). 

Receptors 

8 receptors were up-regulated and 38 down-regulated in 
transformed fibroblasts. Overexpression was observed for 
acetylcholine receptor beta (8 fold), tyrosine kinase recep- 
tor (3 fold), growth hormone releasing hormone receptor 
(3 fold), semaphorin M-sema G (3 fold) and amphiregu- 
lin (2 fold). Strongest down-regulations were found for 
integrin alpha 5 (43 fold), transient receptor protein 2(19 
fold), retinoic acid receptor alpha (14 fold), retinoic or- 
phan receptor 1(11 fold) and platelet derived growth fac- 
tor receptor (12 fold). 

Protein synthesis 

3 genes involved in protein synthesis (BRIX, nucleolin, ri- 
bosomal protein L44 and SIK similar protein) were over- 
expressed and 2 (ribosomal protein S4X and ribosomal 
protein L39) were down-regulated, suggesting that pro- 
tein synthesis is not strongly affected by transformation. 

Proteases and protease inhibitors 

Only the kallikrein B protease and the elafin-like protein 
II protease inhibitor were up-regulated (3 and 2 fold re- 
spectively) after ras vl2 /ElA-transformation. By contrast, 
16 proteases and 7 protease inhibitors were found re- 
pressed in transformed MEFs. The tolloid-like (41 fold) 
and meltrin beta (33 fold) were proteases most down-reg- 
ulated and the tissue factor pathway inhibitor 2 (44 fold) 
and the plasminogen activator inhibitor (31 fold) were 
the most affected protease inhibitors. 

Membrane proteins 

15 genes encoding membrane proteins were up-regulated 
and 21 were down-regulated. Histocompability 2, D re- 
gion locus, (16 fold) and melanoma differentiation asso- 
ciated protein (9 fold) were the most overexpressed genes, 
whereas Thy- 1.2 glycoprotein (36 fold), cadherin 11 (14 
fold) and vascular cell adhesion molecule 1 (13 fold) were 
the most repressed. 

Extracellular matrix 

Laminin gamma 1 (20 folds) and entactin-2 (6 folds) 
were the two extracellular matrix encoding genes found 
up-regulated during transformation, whereas 24 genes 
were down-regulated. Among them, procollagen type VI, 
alpha 1 (121 folds), procollagen type III, alpha 1 (56 
folds), procollagen type I, alpha 1 (44 folds), procollagen 
type I, alpha 2 (37 folds), collagen type VI, alpha 3 
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subunit (21 folds) and decorin (19 folds) were the most 
affected. 

Enzymes 

Twelve enzymes involved in cellular metabolism were 
found overexpressed after ras vl2 /ElA-transformation and 
44 were found down-regulated. The most activated genes 
were serine hydroxymethyl transferase 1 (6 fold), acetyl 
coenzyme A dehydrogenase (5 fold) and the acetyltrans- 
ferase (GNAT) family containing protein (4 fold), where- 
as the most repressed genes were lysozyme P (88 fold), 
lysyl oxydase (61 fold) and lysozyme M (55 fold). Inter- 
estingly, maximal overexpressions were 6, 5 and 4 fold, 
whereas down-regulations were 88, 61 and 55 fold indi- 
cating that in addition to the fact that more genes were 
down-regulated (44 vs. 12), change in expression was also 
more important for down-regulated genes. 

DNA-assodated proteins 

25 genes encoding DNA-assodated proteins were up-reg- 
ulated, whereas no gene of this family was found down- 
regulated. The most strongly activated genes were nucleo- 
side diphosphate kinase (9 fold), the topoisomerase-in- 
hibitor suppressed (7 fold), the helicase lymphoid specific 
(6 fold) and the DNA2-like homolog (6 fold). 

Cytosolic proteins 

Expression of 2 genes encoding cytosolic proteins was ac- 
tivated after ras vl2 /ElA-transformation, whereas expres- 
sion of 6 genes was repressed. Genes coding for acyl-CoA- 
binding protein (3 fold) and tubulin-specific chaperone 
(2 fold) were overexpressed, whereas the most strongly re- 
pressed gene was that coding cytochrome P450 (61 fold). 

Channels 

5 genes encoding channels were up-regulated and also 5 
were down-regulated. Chloride channel protein 3 was the 
most up-regulated gene (11 fold) and the channel beta-1 
subunit (15 fold) was the most down-regulated gene. 

Cell growth-associated proteins 

As expected for transformed cells which grow more rapid- 
ly, 13 genes encoding proteins involved in cell growth 
were found overexpressed, whereas only 3 were found 
down-regulated in ras vl2 /ElA-transformed MEFs. The 
most activated genes were those coding for cyclin-depend- 
ent kinase-like 2 (6 fold) and cell division cycle 7-like 1 (5 
fold) whereas the most repressed gene was cyclin D2 (4 
fold). 

Angiogenesis 

Angiogenesis is a key process in carcinogenesis. Contrary 
to the expected for a tumoral cell, we were unable to 
found angiogenesis-associated genes up-regulated by 
ras vl2 /ElA-transformation. To our surprise, all 8 genes as- 



sociated with angiogenesis showing differential expres- 
sion were down-regulated. These included genes coding 
for thrombospondins 1 (15 fold), 2 (32 fold) and 3 (6 
fold), pigment epithelium-derived factor (26 fold), pleio- 
trophin (24 fold), GROl oncogene (16 fold), angiogenin- 
related protein (4 fold) and tumor necrosis factor induced 
protein 2 (3 fold). 

Apoptosis 

8 apoptosis-related genes were up-regulated in trans- 
formed MEFs and 3 down-regulated. The p53 apoptosis 
effector related to Pmp22 was the most activated gene (19 
fold) and death-associated protein 1 gene was the most 
under-expressed (4 fold) after transformation. 

Unknown function 

3 genes encoding proteins without well defined function 
were found up-regulated in mutated ras-ElA expressing fi- 
broblasts, whereas 8 were found to be down-regulated. 

As a proof-of-principle, we verified the relative expression 
levels of 1 1 of these 815 genes by Northern blot analysis. 
The following 1 1 genes were tested : p8, transgelin, serum 
amyloid A3, lysyl oxidase, thrombospondin 2, extracellu- 
lar superoxide dismutase, biglycan, myb, cytokeratin, 
HMG2 and ezrin. In all of them Northern blot data con- 
firmed microarray data. The first 7 were down-regulated 
in transformed MEFs, the 4 others being overexpressed 
(Figure 3). 

Discussion 

A number of ras-regulated genes have been identified by 
studies on immortalized cells or cancer cells expressing 
the oncogenic ras [14-21]. However, although these re- 
sults are quite interesting, it is important to note that es- 
tablished cell lines are frequently subject to genetic and 
epigenetic changes that are selected during passaging or 
immortalization and may affect ras target-gene expres- 
sion. Primary cultures, such as mouse embryonic fibrob- 
lasts, do not have that drawback. This is why, to identify 
ras target genes, we decided to analyze global gene expres- 
sion shortly after retroviral transfer of an ectopic mutated 
ras in MEFs. Yet, because activated ras alone induces MEF 
senescence instead of transformation, we associated to it 
the adenovirus-derived oncogene E1A. The ras vl2 /ElA 
transformation of MEFs (and of other non-immmortal- 
ized cells as well) is specific and controlled. Using the Af- 
fymetrix technology on ~12,000 genes, we found that 
expression of 6.8% of them was significantly modified in 
MEFs by ras vl2 /ElA-transformation. Because oncogenic 
transformation of fibroblasts allows tumor development 
when cells are injected in the immunocompromised 
mouse (see Figure 1 ), studying target genes of activated ras 
should improve our understanding of the molecular 
mechanisms by which ras transforms cells and eventually 
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Figure 3 

Confirmation of microarray results by Northern blot analysis. I8S rRNA was used as a loading control. Total R.NA isolated 
from pBabe and pBabe-ras vl2 /EI A transduced MEFs were blotted onto Hybond-N membranes and hybridized with 32 P-labeled 
probes as described in Material and Methods section. 
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allows tumor formation. It is interesting to note that only 
24% of down-regulated and 40% of up-regulated genes 
showed strong modification (i.e.: >5 fold change) of its 
expression after transformation. 

Several examples of genes up- or down-regulated upon ras 
transformation have already been reported [22-25], 
Present data on systematic analysis of about one third of 
the expressed genome confirm those reports while extend- 
ing considerably our knowledge of genes activated or re- 
pressed by oncogenic ras in association with the E1A 
adenoviral protein. Our results may explain the behavior 
of transformed cells. For example and as expected, virtual- 
ly all of the genes coding for secreted factors or extracellu- 
lar matrix component, which are associated with a 
differentiated phenotype, were down-regulated. Also, 
morphological changes observed after transformation 
(see Figure 1), may be explained by the fact that 44 genes 
encoding structural proteins were under-expressed. An- 
other expected result was that cell growth-related proteins 
(involved in the regulation of the cell cycle or inducing 
cell proliferation) and DNA-associated proteins (involved 
in DNA replication and reparation) were up-regulated in 
transformed MEFs, in agreement with their accelerated 
growth. Also, it is not a surprise to find an altered expres- 
sion for 56 enzymes involved in cell metabolism because, 
compared to normal fibroblasts, transformed cells show 
accelerated growth, increased migration capacity and 
strong morphological changes. These enzymes could be 
involved in some of these changes. 

Several genes coding for transcription factors (n = 102) 
and proteins involved in signaling pathways (n = 115) 
were up- or down-regulated suggesting that modification 
of the amounts of these factors could be responsible for 
the dramatic changes in gene expression observed in 
transformed cells. It is interesting to note that approxi- 
mately as many transcription factors were up-regulated (n 
= 57) as down-regulated (n = 45). 

Besides data coherent with previous knowledge, we also 
found very unexpected results. For example, we found 
that genes coding for proteases and inhibitors of proteases 
were strongly down- regulated by ras vl2 /ElA transforma- 
tion. This was surprising since these factors are up-regulat- 
ed and strongly involved in tumor progression involving 
mutated ras. This observation could suggest that in 
human cancers, proteases and protease inhibitors are acti- 
vated through a mechanism disconnected from ras activa- 
tion. We were similarly surprised by the fact that all 8 
angiogenic factors present on the array were found down- 
regulated by ras vl2 /ElA transformation. Like proteases 
and inhibitors of proteases, angiogenic factors are in- 
volved in tumour progression and still repressed during 
ras vl2 /ElA-mediated transformation. It is therefore high- 



ly unlikely that their overexpression reported in several 
cancers is controlled by a ras-dependent pathway. Finally, 
it was also unexpected that only 5 genes involved in pro- 
tein synthesis were up- or under-expressed, suggesting 
that protein synthesis is not strongly altered after ras V12 / 
E1A transformation. 

Conclusions 

In conclusion, this study of a large number of genes has 
identified those whose expression is altered upon fibrob- 
last transformation by ras vl2 /ElA. It is however not ex- 
haustive because the analyzed genes are only 
representative of the genome (one third of the expressed 
genes was examined), and post-transcriptional and post- 
translational modifications were not addressed. Yet, infor- 
mation gathered should be quite useful to future investi- 
gations on the molecular mechanisms of oncogenic 
transformation. 

Methods 

Primary mouse embryo fibroblasts (MEFs) 

Primary embryo fibroblasts were isolated from 14.5 day- 
old SV129J mouse embryos following standard protocols 
[26]. Cell were grown in Dulbecco's modified Eagle's me- 
dium (DMEM) supplemented with 1 0% foetal calf serum, 
2 mM L-glutamine, 100 lU/ml penicilin G and 100 ug/ml 
streptomycin. 

Retroviral infection 

Oncogenic ras transforms most immortal rodent cells to a 
tumorigenic state, whereas transformation of mouse pri- 
mary cells requires either a cooperating oncogene or the 
inactivation of a tumour suppressor gene. The adenovirus 
E1A oncogene cooperates with ras to transform primary 
mouse fibroblasts [7] and abrogates ras-induced senes- 
cence [27|. Therefore, we transduced MEFs with the 
pBabe-ras vl2 /ElA retroviral vector which expresses both 
the ras V12 mutated protein and the E1A oncogene to ob- 
tain transformed fibroblasts. pBabe-ras vl2 /ElA [described 
in ref. [27]] and pBabe (as control) plasmids were ob- 
tained from S. Lowe. Bosc 23 ecotropic packaging (10 6 ) 
cells were plated in a 6-well plate, incubated for 24 hr, and 
then transfected with PEI with 5 ug of retroviral plasmid. 
After 48 hr, the medium containing the virus was filtered 
(0.45 urn filter, Millipore) to obtain the first supernatant. 
MEFs were plated at 2 * 10 5 cells per 35 mm dish and in- 
cubated overnight. For infections, the culture medium 
was replaced by an appropriate mix of the first superna- 
tant and culture medium (V/V), supplemented with 4 ug/ 
ml polybrene (Sigma), and cells were incubated at 37 °C. 
As a control, we evaluated the ability of the retroviral vec- 
tor to transduce MEFs by using a retroviral vector express- 
ing the EGFP under control of the retroviral promoter 
located in the long terminal repeat. About 30% of MEFs 
expressed high levels of EGFP fluorescence 48 h after 
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transduction (data not shown), indicating that retroviral 
vectors are well adapted to our experimental set-up. Retro- 
virus-infected cells were selected with puromycin (0.7 ug/ 
ml). Transformation of MEFs by the pBabe-ras vl2 /ElA 
retroviral vector was evaluated by examining changes in 
their morphological aspect, by quantifying expression of 
the RAS protein by western blot, by monitoring cell pro- 
liferation, colony formation in soft-agar and tumors in 
nude mice. In soft-agar assays, pBabe-ras vl2 /ElA trans- 
formed cells formed colonies at high frequency (Figure 1). 
Similarly, transformed cells produced tumors in all (3/3) 
athymic nude mice when injected subcutaneously, where- 
as control MEFs did not (0/3) (Figure 1). 

Western blot analysis 

One hundred pg of total protein extracted from cells was 
separated with standard procedures on 12.5% SDS-PAGE 
using the Mini Protean System (Bio-Rad) and transferred 
to a nitrocellulose membrane (Sigma). The intracellular 
level of RAS was estimated by Western blot using the H- 
ras (C-20) polyclonal antibody (1:200) purchased from 
Santa Cruz Biotechnology, Inc. 

Microarray 

Total RNA was isolated by Trizol (Gibco-BRL by Invitro- 
gen). Twenty ug of total RNA was converted to cDNA with 
Superscript reverse transcriptase (Gibco-BRL by Invitro- 
gen), using T7-oligo-d(T) 24 as a primer. Second-strand 
synthesis was performed using T4 DNA polymerase and E. 
Coli DNA ligase followed by blunt ending by T4 polynu- 
cleotide kinase. cDNA was isolated by phenol-chloroform 
extraction using phase lock gels (Brinkmann). cDNA was 
in vitro transcribed using theT7 BioArray High Yield RNA 
Transcript Labeling Kit (Enzo Biochem, New York, N.Y.) 
to produce biotinylated cRNA. Labelled cRNA was isolat- 
ed using an RNeasy Mini Kit column (Qiagen). Purified 
cRNA was fragmented to 200-300 mer cRNA using a frag- 
mentation buffer (100 mM potassium acetate-30 mM 
magnesium acetate-40 mM Tris-acetate, pH 8.1), for 35 
min at 94 °C. The quality of total RNA, cDNA synthesis, 
cRNA amplification and cRNA fragmentation was moni- 
tored by micro-capillary electrophoresis (Bioanalizer 
2100 by Bioanalyser 2100, Agilent Technologies). The 
cRNA probes were hybridized to an MGu74Av2 Genechip 
(Affymetrix, Santa Clara, CA). The MGu74Av2 Genechip 
represents ~6,000 sequences of mouse Unigene that have 
been functionally characterized and ~6,000 sequences 
ESTs clusters. Each sequence in the chip is represented by 
32 probes : 16 "perfect match" (PM) probes that are com- 
plementary to the mRNA sequence and 16 "mismatch" 
(MM) probes that only differ by a single nucleotide at the 
central base (more detailed information about the 
MGu74Av2 Genechip can be obtained in the web site ht- 
tp :// www. a ffym et rix. com . Fifteen micrograms of frag- 
mented cRNA was hybridized for 16 h at 45° C with 



constant rotation (60 rpm). Microarrays were processed in 
an Affymetrix GeneChip Fluidic Station 400. Staining was 
made with streptavidin-conjugated phycoerythrin (SAPE) 
followed by amplification with a biotinylated anti- 
streptavidin antibody and a second round of SAPE, and 
then scanned using an Agilent GeneArray Scanner (Agi- 
lent Technologies). Expression value (signal) is calculated 
using Affymetrix Genechip software MAS 5.0 (for fully de- 
scription of the statistical algorithms see http://affyme- 
trix.com/support/technical/whit.epapers/ 
sadd whitepaper prif Briefly, signal is calculated as fol- 
low : First, probe cell intensities are processed for global 
background. Then, MM value is calculated and subtracted 
to adjust the PM intensity in order to incorporate some 
measure of non-specific cross-hybridization to mismatch 
probes. Then, this value is log-transformed to stabilize the 
variance. Signal is output as the antilog of the resulting 
value. The 20 probe pairs representing each gene are con- 
solidated into a single expression level. Finally, software 
scales the average intensity of all genes on each array with- 
in a data set. Final value of signal is considered represent- 
ative of the amount of transcript in solution. 

Housekeeping controls P-actin and GAPDH genes serve as 
endogenous controls and are useful for monitoring the 
quality of the target. Their respective probe sets are de- 
signed to be specific to the 5', middle, or 3' portion of the 
transcript. The 3'/5' signal ratio from these probe sets is in- 
formative about the reverse transcription and in vitro tran- 
scription steps in the sample preparation. Then, an ideal 
target in which all transcripts was full-length transcribed 
would have an identical amount of signal 3' and 5' and 
the ratio would be equal to 1. Differences greater than 
three fold between signal at 3' and 5' for these housekeep- 
ing genes indicate that RNA was incompletely transcribed 
or target may be degraded. Ratio of fluorescent intensities 
for the 5' and 3' ends of these housekeeping genes was <2. 

Hybridization experiments were repeated twice using in- 
dependent cRNA probes synthezised with RNA from two 
independent sets of MEF-infected cells. Genes were con- 
sidered as differentially expressed when both hybridiza- 
tions showed >2 folds change. Data presented in this work 
represent the average of both hybridizations. The list of 
unchanged genes should be obtained from authors upon 
request. 

Validation of gene expression profiles by Northern blot 
hybridization 

Synthesis of probes: One microgram of total RNA from 
MEF cells was subjected to PCR with reverse transcription 
using the One Step RT-PCR kit (Gibco-BRL) according to 
the manufacturer's protocol to synthezise specific cDNA 
probes. PCR were carried out for 32 cycles, each cycle 
consisting in a denaturing step for 1 min at 94 °C, an 
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annealing step for 2 min at 56 °C, and a polymerization 
step for 2 min at 72 °C. Selected RNA species were ampli- 
fied using the following primers: p8, sense, 5'-ggagagagca- 
gactaggcata-3' and antisense, 5'-gttgctgccacccaagggcat-3'; 
transgelin, sense, 5'-ccagccagctctgcagatggg-3'and anti- 
sense, 5'-gcaggcagatttctgagttc-3'; serum amyloid A3, sense, 
5'-ggatgaagccttccattgcc-3' and antisense, 5'-gaagagctacac- 
cgccactc-3'; lysyl oxidase, sense, 5'-taaaacgactgtccccaacc-3' 
and antisense, 5-tcacggccgttgttagtgta-3'; thrombospondin 
2, sense, 5'-aagcccagtcgggcttacgg-3' and antisense, 5-tgct- 
ggagctggagccctgc-3'; extracellular superoxide dismutase, 
sense, 5'-ccttagttaacccagaaatct-3' and antisense 5'-gtacct- 
caaaggtgctcactgg-3'; biglycan, sense, 5'-ggctgctttctgct- 
tcacagg-3' and antisense 5'-gcaactgaccatcacctccta-3'; myb 
proto-oncogene, sense, 5'-ctaaaccatttcatgaggag-3' and an- 
tisense, 5-aacaaatgcaaaattcaccc-3'; cytokeratin, sense, 5'-ct- 
ggtctcagcagattgagg-3' and antisense, 5- 

ggtaggtggcaatctctgcc-3' ; high mobility group protein 2, 
sense, 5'-cgtctgccttctgcctgttttg-3' and antisense 5'-gccctt- 
gacacggtatgcagc-3' and ezrin, sense, 5'-caacgaggagaagcg- 
gatca-3' and antisense 5'-gtgtgacacctgcctgcagtg-3'. 
Specificity of the PCR products was confirmed by direct 
DNA sequencing. 

Northern blot hybridization: RNA samples (10 ug) were 
submitted to electrophoresis on a 1% agarose gel and vac- 
uum blotted onto Hybond-N membranes (Amersham). 
The filters were hybridized with the 32 P-labeled probes for 
16hat65°Cin 5XSSPE (1XSSPE is 180 mM NaCl, 1 mM 
EDTA, 10 mM NaH 2 P0 4 , pH 7.5), 5X Denhardt solution, 
0.5% SDS and 100 u,g/ml single stranded herring sperm 
DNA. Filters were then washed four times for 5 min at 
room temperature in 2X SSC, 0.2% SDS, twice for 15 min 
at 50°C in 0.2X SSC, 0.2% SDS, and once for 30 min in 
0.1X SSC at 50 "C before autoradiography exposure on 
Kodak X-Omat films at -80 "C from 8 hr to 4 days. 
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As we enter an age in which genomics and biomlomialics make possible the discovery of 
new knowledge about the biological characteristics of an organism, it is critical that we 
attempt to report newly discovered "significant" phenotypes only when they are actually of 
significance. With the relative youth of genome-scale gene expression technologies, how to 
make such distinctions has yet to be better defined. We present a "mask technology" by 
which to filter out those levels of gene expression that fall within the noise of the 
experimental techniques being employed. Conversely, our technique can lend validation to 
significant fold differences in expression level even when the fold value may appear quite 
small (e.g. 1.3). Given array-organized expression level results from a pair of identical 
experiments, our ID Mask Tool enables the automated creation of a two-dimensional "region 
of insignificance" that can then be used with subsequent data analyses. Fundamentally, this 
should enable researchers to report on findings that are more likely to be in nature truly 
meaningful. Moreover, this can prevent major investments of time, energy, and biological 
resources into the pursuit of candidate genes that represent false positives. 



1 Introduction 

As we enter one of the most exciting times in the history of science, in which 
genomics and bioinformatics are coming together to make possible the discovery of 
new knowledge about living organisms at their molecular level, it is imperative that 
we avoid discovery of "truths" that are not so. While the temptation to plunge into 
tracing out metabolic pathways, cellular interactions, or genetic regulatory circuits — 
especially now that we have technologies allowing genome-wide study of RNA 
expression — is very strong, we must pause long enough to consider how best to 
report our results such that they may be meaningful. Specifically, for microarray- 
based expression technologies, whether they are glass microarrays, nylon 
membranes, or other formats, we need to better understand how to distinguish 
significant fold difference values from those that fall within the noise level of the 
experiment at hand. 



EXHIBIT 
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Francis Collins rightfully speculates about the large impact that microarray 
technology is likely to have, yet reminds us of the "many critically important 
questions about this new field that are yet unaddressed" [1]. Some have criticized 
array-based methods for not being model-based, or hypothesis-driven, while others 
support that the exploratory nature can lead to new hypotheses that then can be 
tested in the laboratory [2]. Especially because such hypothesis testing of candidate 
genes, cell-cell interactions, or pathways requires a major investment of time, 
energy, and biological resources, an important challenge is understanding how to 
better recognize false-positive results. 

We present a "mask technology" by which to filter out those levels of gene 
expression that fall within the noise of the experimental techniques being employed. 
Conversely, our technique can lend validation to the significance of fold differences 
in expression level even when the fold value may appear quite small. Our work is 
based on the notion that gene expression measurements ought to be repeatable. Fold 
differences for each corresponding pair of genes in a pair of "identical" experiments 
should therefore be equal to unity. Identical experiments are ones in which the 
operating conditions, cell lines, culture media, incubation time, and so forth are 
controlled to be the same. We first explore whether this is the case by examining 
several pairs of identical experiments. We then develop the ID Mask Tool, which 
enables the automated creation of a two-dimensional "region of insignificance" that 
can be used with subsequent data analyses. 

2 Materials and Methods 

2.1 Data Collection 

The data for this study were collected to evaluate the use of microarray technology 
for detection of ESE-1 target genes after transient transfection into different cell 
lines. We hypothesized that a transfection efficiency of greater than 70-80% should 
be sufficient to detect differences in gene expression between two samples. We first 
determined the transfection efficiency of various cell lines using a green fluorescent 
protein (GFP) expression vector. Four of the cell lines tested (HT1080, 293, MCF-7, 
and MG-63) conformed to the criteria set by us. Total RNA was isolated from MCF- 
7 human breast cancer cells and MG-63 human osteosarcoma cells transiently 
transfected with an ESE-1 expression vector 20 and 24 hours after transfection. 
Experiments were performed in duplicates in order to distinguish, from gene 
expression, differences due to "biological noise." Specifically, six pairs of these 
duplicated experiments served as the source of the data that we subsequently used to 
develop the identity mask methodology. The ESE-1 expression vector also 
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expressed GFP, which enabled us to confirm transfection efficiencies for each 
experiment. 32 P-labeled cDNA probes reverse-transcribed from these RNAs were 
hybridized to the Atlas Human cDNA Expression Arrays from Clontech (Clontech 
Laboratories, Inc., Palo Alto, CA) [3]. Each of these Atlas Arrays (Human 1.2 I, 
Human Cancer) is a nylon membrane on which approximately 1200 human cDNAs 
have been immobilized. The hybridization results were analyzed with the software 
provided by Clontech by normalizing to the signals obtained from housekeeping 
gene controls on the same array as well as by global normalization. The microarray 
experiments were validated by RT/PCR using the same RNAs. 

2.2 Data Analysis and Mask Creation 

We developed the ID Mask Tool, a custom-designed computer program written in 
the C language, to perform mask creation. The ID Mask Tool takes as input two 
spreadsheet files corresponding to two identical experiments, along with two user 
customizable parameters to be discussed below. It returns as output an "identity 
mask," or ID Mask, specifically for those two experiments. 

Each spreadsheet contains the names of several hundred genes and their 
corresponding brightness intensity levels (as assessed by hybridization of the probe 
of interest). Only genes present in both files are further considered. For each of 
these genes, we calculate a "fold difference," the ratio of the intensity in file 2 to the 
intensity in file 1 for a given gene. All fold values are then sorted based on the 
corresponding intensity values of the set of genes in the first spreadsheet file. Two 
parameters are used for creation of each identity mask: intensity range (or sliding 
window) size, plus either scale value or number of standard deviations. These are 
used to calculate the ID Mask borders and can be experimented with for better 
results. 

Two methods are then explored for creating identity masks. Method 1 relies on 
segmental calculation of standard deviations. A "data point" refers to an (x, y) 
pairing in which x is an intensity value from the first spreadsheet file and y is its 
corresponding fold difference value (calculated as above). Using all data points in a 
given sliding window of intensity values (e.g., from intensity level 1001 to 2000), 
the standard deviation of the fold values is calculated. The average of the intensity 
values within that window is then paired with a fold value equal to the average fold 
value within that window plus the number of standard deviations specified by the 
user. This new pair becomes a candidate "upper mask border" point. Similarly, a 
candidate "lower mask border" point is created by pairing the average intensity 
value of that window with the average fold value minus the number of standard 
deviations specified by the user. Each successive group of data points in each 
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sliding window of intensity values (e.g., all points from 2001 to 3000, then all points 
from 3001 to 4000, etc.) likewise gives rise to candidate mask border points. 

The set of {intensity value, fold value) pairs comprising the candidate upper mask 
border points is then fit to a line using least-squares linear regression. This line 
defines the upper mask border. Similarly, linear regression is used to find the lower 
mask border from the set of calculated candidate lower mask border points. If one 
of the derived mask borders fits poorly (based upon relationship to original data 
points), the "reciprocal reflection" of the other mask border can serve in its place. 
This simply means that each (x, y) point on the good-fit (linear) border gives rise to 
a point (x, 1/y) to create the reciprocal reflection border. (See Figures 1 through 6 
for examples of mask borders. Figures 2 — 5 show ID Masks each consisting of one 
linear regression border and one border derived by taking the reciprocal values of 
that linear regression border.) The region between these borders represents the 
"identity" region of insignificant fold differences (i.e., noise). 




Figure 1: Identity mask for Experiment A. Method 2 with parameters 9000 for 
intensity sliding window size and 0.975 for scale resulted in the lowest percentage of 
original data points lying outside of the mask region (0.7%). 



Method 2 for creating an identity mask is similar to Method 1 except that candidate 
mask border points are derived from maximal (and minimal) points in each intensity 
window rather than from standard deviation calculations. Specifically, amongst all 
data points in a given window of intensity values, the point with the greatest fold 
value is chosen. This is repeated for each successive window of intensity values. 
These fold values can also be scaled before use in linear regression to find the upper 
mask border. The lower mask border is analogously derived from the smallest fold 
values. 
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Once the ID Mask has been derived, all original data points are checked for 
inclusion or exclusion in the identity mask region. The percentage of data points 
lying outside of the mask region is reported. 
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. : ^ — _i_ ; , 







Figure 2: Identity mask for Experiment B. Method l with parameters 9000 for 
intensity window size and standard deviation of 3 resulted in the lowest percentage 
of original data points lying outside of the mask region (1 .7%). 



Table 1: Numbers of genes present in each of the experiment pairs, along with the 
number of genes common to both files in each pair. 





# Genes 


# Genes 


# Genes 




in 1" File 


in 2° d File 


in Both 


Expt A 


563 


559 


550 


ExptB 


292 


516 


291 


ExptC 


244 


401 


244 


Expt D 


339 


518 


326 


Expt E 


365 


397 


344 


Expt F 


233 


226 


180 



3 Results 

Six pairs of experiments were performed with Clontech nylon membrane filters and 
tumor cell lines as described in the Methods section, resulting in twelve spreadsheet 
files of genes and their corresponding expression intensity values. The ID Mask 
Tool was used to perform all mask creation experiments as well as basic data 
analysis. Table 1 displays the number of genes present in each of the file pairs, 
along with the number of genes common to both files in each pair. 
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Figure 3: Identity mask for Experiment C. Method 1 with parameters 9000 for 
intensity window size and standard deviation of 3 resulted in the lowest percentage 
of original data points lying outside of the mask region (2.0%). 



Figure 4: Identity mask for Experiment D. Method 1 with parameters 9000 for 
intensity window size and standard deviation of 3 resulted in the lowest percentage 
of original data points lying outside of the mask region (1 .5%). 



For both Methods 1 and 2 of ID Mask creation, sliding windows of size 1000, 5000, 
and 9000 on the intensity value axis were chosen for experimentation. Only when 
calculations were not possible with one of these window sizes (e.g., due to division 
by zero) was an alternative window size chosen. For Method I, the number of 
standard deviations (for calculation of candidate mask border points) was chosen to 
be 2.5 and 3. For Method 2, the scale factor was chosen to be 0.975 and 1 .0. 
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Figure 5: Identity mask for Experiment E. Method 1 with parameters 5000 for 
intensity window size and standard deviation of 3 resulted in the lowest percentage 
of original data points lying outside of the mask region (0.9%). 




Figure 6: Identity mask for Experiment F. Method 1 with parameters 9000 for 
intensity window size and standard deviation of 3 resulted in the lowest percentage 
of original data points lying outside of the mask region (1 .7%). 



Twelve candidate identity masks were created for each pair of experiments (2 
Methods, times 3 intensity window sizes, times 2 scale or standard deviation 
factors). For each pair of experiments, the ID Mask Tool selected the mask with the 
lowest percentage of original data points lying outside of the mask region. Figures 1 
through 6 show each selected identity mask along with a scatter plot of the original 
(intensity value, fold value) data points for each pair of experiments. Tables 2 and 3 
list the percentages of original data points lying outside of the mask region for each 
of the 12 candidate masks derived for each experiment pair. 
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Table 2: Each pair of identical experiments gave rise to 12 candidate ID Masks. Six 
of these twelve were derived by Method 1 (three with standard deviation of 3 and 
three with standard deviation of 2.5). The other six were derived by Method 2 (three 
with scale 1.00 and three with scale 0.975). Shown are the percentages of original 
data points lying outside of the mask region for each of the 12 candidate ID Masks 
derived for each of Experiments A— C. [o = standard deviation; intensity range 
(window) size of 2000 instead of 1000 is used in Experiments B and C for the 





Expt 
A 


A 


A 


Expt 
B 


B 


B 


Expt 
C 


c 


c 


range 


1000 


5000 


9000 


2000; 
1000 


5000 


9000 


2000; 
1 000 


5000 


9000 


3 


3.1 


2.7 


2.2 


93.2 


80.5 


1.7 


99.6 


100.0 
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3.8 
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100.0 


2.4 


Scale 
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19.2 


6.2 


0.7 


100.0 


99.0 


2.4 


100.0 


100.0 


2.4 


Scale 
0.975 


19.2 


6.2 


0.7 


100.0 


99.3 


2.4 


100.0 


100.0 


2.9 



Table 3: Percentages of original data points lying outside of the mask region for 
each of the 12 candidate ID Masks derived for Experiments D— F. (See caption in 
Table 2 for further details.) [a = standard deviation; intensity range (window) size 
of 3000 instead of 1000 is used in Experiment D for Method 1 trials, while range 
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4 Discussion 

DNA microarrays clearly are making a large impact on the way we approach 
problems in molecular biology and genomics. These devices are enabling the 
genome-wide study of expression in Escherichia coli K-12, for example [4]. Others 
are using DNA microarrays in the study of B-cell lymphomas [5], growth control 
genes [6], and aging [7]. Some researchers are focusing on developing new [8] or 
using existing [9] clustering techniques to facilitate the analysis of all the data made 
available by this relatively new technology. Few, however, have focused 
specifically on studying the properties of these array data to better understand how 
to distinguish significant from insignificant "findings." 

One way we might be able to better discern meaningful discoveries from the rest is 
by applying an identity mask technology, such as the one we have presented. Our 
experiments show that greater amounts of biological noise are present at lower gene 
expression levels. Thus, there is no magical absolute cut-off for a meaningful fold 
value. There does appear to exist, however, a "mask of insignificant values," 
outside of which the fold values are more likely to represent true significance. In 
Figure 6, for example, a fold difference of 1.5 may be meaningful at an intensity 
level of 60,000, while a fold difference of 2.5 may be insignificant at an intensity 
level of 20,000. This result is in stark contrast to a study by Incyte Pharmaceuticals 
[11], in which they conclude: "any elements with observed ratios greater than or 
equal to 1.8 should be deemed differentially expressed." A brief glance at the 
microarray-related literature will quickly confirm that others are also reporting 
particular fold difference values, such as 1 .8, as significant [7], We argue, however, 
that the significance of a fold change depends upon the intensity value; genes that 
are expressed at low levels and hence have weak intensity signals need to show a 
much greater fold difference than highly expressed genes. 

Some have proposed simple statistical tests to determine whether fold differences 
are significant; t-tests, for example, are included in the GeneSpring software 
package (Silicon Genetics, San Carlos, CA). Lee et al. propose a statistical method 
using normal distributions and posterior probabilities to determine the likelihood that 
a gene is truly expressed in a tissue sample [12], Methods like these are no doubt 
important; used alone, however, they may under-emphasize the correlation between 
fold values and intensity values. Future efforts might explore how to best use 
statistical validation techniques in conjunction with the identity mask method. 

While our study used Clontech filters, the general techniques presented for 
understanding identity masks of insignificance apply to all different types of 
expression arrays. Both nylon membrane and glass slide array techniques have their 
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individual advantages. Nylon membrane arrays have sensitive detection using 
hybridized 32 P probes. Glass microarrays have high-resolution fluorescent detection, 
dual labeling for hybridizing two probes on a single array, and ease in automated 
handling of slides [3]. Richmond el al. compared hybridization of radioactive 
cDNAs to spot blots on nylon membranes with fluorescence-based hybridization to 
glass microarrays; they found both methods to be reliable and reproducible [4], 
Chen describes a colorimetry detection system for use with nylon membranes [13]. 

Regardless of the specific array format employed, it seems clear that a custom- 
derived identity mask is one method that could help improve appropriate reporting 
of fold difference results. Future work should include an exploration of fitting 
curves rather than lines for the mask borders. The upper mask border in Figure 2, 
for example, may benefit from a fitted curve, or at least a piecewise linear model. 

An alternative method for mask creation might be to always calculate fold 
differences greater than 1 by simply swapping the order of individual intensity 
values whenever the fold value is less than 1 . Only the upper mask border would 
then need to be created. (The lower mask border would be the unity fold difference 
line.) 

It is not clear why there were some large differences between the numbers of genes 
detected in the experiment pairs of Experiments B, C, and D. These may have been 
due to experimental error or biological noise. Interestingly, the identity masks for 
these three also do not fit as nicely as those for Experiments A, E, and F. 

While we have selected from amongst the candidate identity masks those with the 
lowest percentages of points outside the mask region, future work might consider 
refining the mask fit to purposely exclude approximately 5% of the data points. This 
could be likened top < 0.05, in which 5% of the time, we may inadvertently report a 
result as significant even though it is not. A potential benefit is a closer overall 
mask fit and therefore less likelihood to call a significant finding insignificant. 

In only one out of the six pairs of experiments did Method 2 (scaling values) 
perform better than Method 1 (standard deviations). This is possibly due to the 
mathematical basis upon which standard deviations are calculated, making them in 
general more robust and accurate. One way in which scaling actual data points can 
fail is when there exist outliers. Another is with the choice of too small an intensity 
window size. This can lead to a sort of "overfitting" problem; our group of 
candidate "maximum" points from which to derive the upper mask border may then 
contain several non-maximum values. In Tables 2 and 3, there is a definite trend of 
worsening mask fit as one decreases the intensity range (window) size from 9000 to 
1000. It is likely that in most applications, Method 1 may be more suitable. 
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Our aim has been to provide a foundation for evaluating fold values. The ultimate 
goal is to find truly significant fold differences when performing "treatment versus 
control" comparisons. Analyses of those types of comparisons will likely further our 
understanding of the masking technique as well. Especially because we recognize 
the use of DNA microarrays as a method by which to explore the genome in a 
model-independent fashion [10], it is imperative that we have a basis for judging 
exploratory findings as being important or simply "in the noise." Candidate genes 
found through exploration can lead to investment of significant resources; we need 
to avoid such pursuits of false positive findings. 
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